Quick Links

Re: incorrect handling of the timeout in pg_receivexlog

Lists:	pgsql-hackers

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 05:58:11
Message-ID:	CAHGQGwFbqTDBrw95jek6_RvYG2=E-6o0HOpbeEcP6oWHJTLkUw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

When I compiled HEAD with --disable-integer-datetimes and tested
pg_receivexlog, I encountered unexpected replication timeout. As
far as I read the pg_receivexlog code, the cause of this problem is
that pg_receivexlog handles the standby message timeout incorrectly
in --disable-integer-datetimes. The attached patch fixes this problem.
Comments?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
timeout_handling_v1.patch	text/x-diff	1.6 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 07:03:42
Message-ID:	CAHGQGwFmWE8qjJ7L_WwdZtxKjXYXyZgGXo6+4ZV6BNb8A6NUrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 7, 2012 at 2:58 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> When I compiled HEAD with --disable-integer-datetimes and tested
> pg_receivexlog, I encountered unexpected replication timeout. As
> far as I read the pg_receivexlog code, the cause of this problem is
> that pg_receivexlog handles the standby message timeout incorrectly
> in --disable-integer-datetimes. The attached patch fixes this problem.
> Comments?

receivelog.c
-------
timeout.tv_sec = last_status + standby_message_timeout - now - 1;
if (timeout.tv_sec <= 0)
-------

Umm.. the above code also handles the timestamp incorrectly. ISTM that the
root cause of these problems is that receivelog.c uses TimestampTz. What
about changing receivelog.c so that it uses time_t instead of TimestampTz?
Which would make the code simpler, I think.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 09:31:19
Message-ID:	4F30EF67.4030405@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07.02.2012 09:03, Fujii Masao wrote:
> On Tue, Feb 7, 2012 at 2:58 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> When I compiled HEAD with --disable-integer-datetimes and tested
>> pg_receivexlog, I encountered unexpected replication timeout. As
>> far as I read the pg_receivexlog code, the cause of this problem is
>> that pg_receivexlog handles the standby message timeout incorrectly
>> in --disable-integer-datetimes. The attached patch fixes this problem.
>> Comments?
>
> receivelog.c
> -------
> timeout.tv_sec = last_status + standby_message_timeout - now - 1;
> if (timeout.tv_sec<= 0)
> -------
>
> Umm.. the above code also handles the timestamp incorrectly. ISTM that the
> root cause of these problems is that receivelog.c uses TimestampTz.

Yep. While localGetCurrentTimestamp() returns a TimestampTz and handles
float timestamps correctly, the caller just assigns the result to a
int64 variable, assuming --enable-integer-datetimes.

> What about changing receivelog.c so that it uses time_t instead of
> TimestampTz? Which would make the code simpler, I think.

Hmm, that would reduce granularity to seconds. The --statusint option is
given in seconds, but it would be good to have more precision in the
calculations to avoid rounding errors.

But actually, if the purpose of the --statusint option is to avoid
disconnection because of exceeding the server's replication_timeout, one
second granularity just isn't enough to be begin with.
replication_timeout is given in milliseconds, so if you set
replication_timeout=900ms in the server, there is no way to make
pg_basebackup/pg_receivexlog to reply in time.

So, --statusint needs to be in milliseconds. And while we're at it, how
difficult would be to ask the server for the current value of
replication_timeout, and set --statusint automatically based on that? Or
perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that
depending on a server setting, you need to pass an option in the client
or it will just silently fail with no indication of what the problem is.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 09:35:23
Message-ID:	CABUevEyNa7jr6hNT6e5A5-=3Qkhztiqn+3h3se3PLRpL-6f6zQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 07.02.2012 09:03, Fujii Masao wrote:
>>
>> On Tue, Feb 7, 2012 at 2:58 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> When I compiled HEAD with --disable-integer-datetimes and tested
>>>
>>> pg_receivexlog, I encountered unexpected replication timeout. As
>>> far as I read the pg_receivexlog code, the cause of this problem is
>>> that pg_receivexlog handles the standby message timeout incorrectly
>>> in --disable-integer-datetimes. The attached patch fixes this problem.
>>> Comments?
>>
>>
>> receivelog.c
>> -------
>> timeout.tv_sec = last_status + standby_message_timeout - now - 1;
>> if (timeout.tv_sec<= 0)
>> -------
>>
>> Umm.. the above code also handles the timestamp incorrectly. ISTM that the
>> root cause of these problems is that receivelog.c uses TimestampTz.
>
>
> Yep. While localGetCurrentTimestamp() returns a TimestampTz and handles
> float timestamps correctly, the caller just assigns the result to a int64
> variable, assuming --enable-integer-datetimes.

Ugh. Indeed.

>> What about changing receivelog.c so that it uses time_t instead of
>> TimestampTz? Which would make the code simpler, I think.
>
>
> Hmm, that would reduce granularity to seconds. The --statusint option is
> given in seconds, but it would be good to have more precision in the
> calculations to avoid rounding errors.
>
> But actually, if the purpose of the --statusint option is to avoid
> disconnection because of exceeding the server's replication_timeout, one
> second granularity just isn't enough to be begin with. replication_timeout
> is given in milliseconds, so if you set replication_timeout=900ms in the
> server, there is no way to make pg_basebackup/pg_receivexlog to reply in
> time.
>
> So, --statusint needs to be in milliseconds. And while we're at it, how
> difficult would be to ask the server for the current value of
> replication_timeout, and set --statusint automatically based on that? Or
> perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that
> depending on a server setting, you need to pass an option in the client or
> it will just silently fail with no indication of what the problem is.

We can't really ask for it easily, since we're on a replication
connection. Unless we add that to the walsender grammar, but that
would make it impossible to use the receivexlog stuff against a 9.1
server (which I think still works, though I haven't tested it in a
while).

Do we have a facility to make it a GUC_REPORT but only for walsender
connections? It seems like a very unnecessary thing to have it sent
out over every single connection, since it would only be useful in a
very small subset of them.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 09:55:14
Message-ID:	4F30F502.5090801@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07.02.2012 11:35, Magnus Hagander wrote:
> On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> So, --statusint needs to be in milliseconds. And while we're at it, how
>> difficult would be to ask the server for the current value of
>> replication_timeout, and set --statusint automatically based on that? Or
>> perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that
>> depending on a server setting, you need to pass an option in the client or
>> it will just silently fail with no indication of what the problem is.
>
> We can't really ask for it easily, since we're on a replication
> connection. Unless we add that to the walsender grammar, but that
> would make it impossible to use the receivexlog stuff against a 9.1
> server (which I think still works, though I haven't tested it in a
> while).

You could put a version-check there, and only send the command if
connected to a 9.2 server.

> Do we have a facility to make it a GUC_REPORT but only for walsender
> connections?

There's no such facility at the moment.

> It seems like a very unnecessary thing to have it sent out over every
> single connection, since it would only be useful in a very small
> subset of them.

True, and conversely, many of the current GUC_REPORT settings don't
apply to replication clients at all. Like standard_conforming_strings
and DateStyle.

I think we need another flag for settings that should be sent to
replication clients. GUC_REPORT_REPLICATION? Some settings would have
both GUC_REPORT and GUC_REPORT_REPLICATION, while others would have only
one of them.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 10:06:48
Message-ID:	CABUevEw8HwSPy=n674oZAmisrHFww7APHS3spdT8nn-sWUQ4-A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 7, 2012 at 10:55, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 07.02.2012 11:35, Magnus Hagander wrote:
>>
>> On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>
>>> So, --statusint needs to be in milliseconds. And while we're at it, how
>>>
>>> difficult would be to ask the server for the current value of
>>> replication_timeout, and set --statusint automatically based on that? Or
>>> perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that
>>> depending on a server setting, you need to pass an option in the client
>>> or
>>> it will just silently fail with no indication of what the problem is.
>>
>>
>> We can't really ask for it easily, since we're on a replication
>> connection. Unless we add that to the walsender grammar, but that
>> would make it impossible to use the receivexlog stuff against a 9.1
>> server (which I think still works, though I haven't tested it in a
>> while).
>
>
> You could put a version-check there, and only send the command if connected
> to a 9.2 server.

True..

>> Do we have a facility to make it a GUC_REPORT but only for walsender
>> connections?
>
>
> There's no such facility at the moment.
>
>
>> It seems like a very unnecessary thing to have it sent out over every
>> single connection, since it would only be useful in a very small
>> subset of them.
>
>
> True, and conversely, many of the current GUC_REPORT settings don't apply to
> replication clients at all. Like standard_conforming_strings and DateStyle.

Right. But it's less important there, since the replication
connections will in any reasonable configuration be only a tiny part
of the connections.

> I think we need another flag for settings that should be sent to replication
> clients. GUC_REPORT_REPLICATION? Some settings would have both GUC_REPORT
> and GUC_REPORT_REPLICATION, while others would have only one of them.

Yup, seems like the cleanest solution.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 13:10:09
Message-ID:	CAHGQGwFutqnFPBYcHUCuoy1zMVDXto=o4OgsjrBWxW4zj2TCSw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 7, 2012 at 7:06 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Feb 7, 2012 at 10:55, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 07.02.2012 11:35, Magnus Hagander wrote:
>>>
>>> On Tue, Feb 7, 2012 at 10:31, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>>
>>>> So, --statusint needs to be in milliseconds. And while we're at it, how

Attached patch does that and fixes the problem caused under
--disable-integer-datetimes.

>>>>
>>>> difficult would be to ask the server for the current value of
>>>> replication_timeout, and set --statusint automatically based on that? Or
>>>> perhaps mark replication_timeout as GUC_REPORT. It is rather fiddly that
>>>> depending on a server setting, you need to pass an option in the client
>>>> or
>>>> it will just silently fail with no indication of what the problem is.
>>>
>>>
>>> We can't really ask for it easily, since we're on a replication
>>> connection. Unless we add that to the walsender grammar, but that
>>> would make it impossible to use the receivexlog stuff against a 9.1
>>> server (which I think still works, though I haven't tested it in a
>>> while).
>>
>>
>> You could put a version-check there, and only send the command if connected
>> to a 9.2 server.
>
> True..
>
>
>>> Do we have a facility to make it a GUC_REPORT but only for walsender
>>> connections?
>>
>>
>> There's no such facility at the moment.
>>
>>
>>> It seems like a very unnecessary thing to have it sent out over every
>>> single connection, since it would only be useful in a very small
>>> subset of them.
>>
>>
>> True, and conversely, many of the current GUC_REPORT settings don't apply to
>> replication clients at all. Like standard_conforming_strings and DateStyle.
>
> Right. But it's less important there, since the replication
> connections will in any reasonable configuration be only a tiny part
> of the connections.
>
>
>> I think we need another flag for settings that should be sent to replication
>> clients. GUC_REPORT_REPLICATION? Some settings would have both GUC_REPORT
>> and GUC_REPORT_REPLICATION, while others would have only one of them.
>
> Yup, seems like the cleanest solution.

Agreed. But when replication_timeout is changed by SIGHUP in the server,
there would be a delay before pg_receivexlog receives the parameter
change packet and changes the standby message interval based on the
new value of replication_timeout. Which might cause unexpected replication
timeout. So the user-settable interval (i.e., --statusint) is still useful for
people who want to avoid such fragility, even if we will support the auto-
tuning of the interval.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment	Content-Type	Size
timeout_handling_v2.patch	text/x-diff	9.9 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 14:55:14
Message-ID:	6616.1328626514@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 07.02.2012 09:03, Fujii Masao wrote:
>> What about changing receivelog.c so that it uses time_t instead of
>> TimestampTz? Which would make the code simpler, I think.

> Hmm, that would reduce granularity to seconds.

It also creates portability issues that I'd just as soon not deal with,
ie, time_t is not the same width on all platforms. (The integer vs
float TimestampTz issue is a kind of portability problem, but we've
already bought into the assumption that sender and receiver must be
built with the same choice, no?)

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Magnus Hagander <magnus(at)hagander(dot)net>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 16:29:42
Message-ID:	4F315176.7010001@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 07.02.2012 16:55, Tom Lane wrote:
> (The integer vs float TimestampTz issue is a kind of portability
> problem, but we've already bought into the assumption that sender and
> receiver must be built with the same choice, no?)

Hmm, true. In hindsight, I think that was a bad choice, but it's a bit
late to change that. pg_basebackup doesn't otherwise care about the
integer/float timestamps, but it does send a timestamp back to the
server. You won't be able to actually start up the database if the
config options don't match, but I think it would be good if
pg_basebackup still worked across platforms and versions. For example,
you might have a central backup server that calls pg_basebackup on
several database servers, running on different platforms.

In 9.0, the only field in the protocol that depends on timestamp format
is WalDataMessageHeader->sendTime. That goes from server to client, and
pg_basebackup/pg_receivexlog don't care about that. In 9.1 we introduced
StandbyReplyMessage->sendTime, which is sent from client to server, but
looking at the code it looks like the server doesn't use it for
anything. In 9.2, we added WalSndrMessage->sendTime, which is used by a
standby server to calculate how far behind the standby is.

I'm tempted to just change all of those TimestampTz fields to something
that's independent of integer/float timestamp setting, in 9.2. At a
quick glance, it seems that it wouldn't break anything.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-07 16:33:39
Message-ID:	CABUevEwwJ-A7XFhT2adBp9ZDyQE9ADfLJVe9szayCsgKJ1tasA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 7, 2012 at 17:29, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 07.02.2012 16:55, Tom Lane wrote:
>>
>> (The integer vs float TimestampTz issue is a kind of portability
>> problem, but we've already bought into the assumption that sender and
>> receiver must be built with the same choice, no?)
>
>
> Hmm, true. In hindsight, I think that was a bad choice, but it's a bit late
> to change that. pg_basebackup doesn't otherwise care about the integer/float
> timestamps, but it does send a timestamp back to the server. You won't be
> able to actually start up the database if the config options don't match,
> but I think it would be good if pg_basebackup still worked across platforms
> and versions. For example, you might have a central backup server that calls
> pg_basebackup on several database servers, running on different platforms.
>
> In 9.0, the only field in the protocol that depends on timestamp format is
> WalDataMessageHeader->sendTime. That goes from server to client, and
> pg_basebackup/pg_receivexlog don't care about that. In 9.1 we introduced
> StandbyReplyMessage->sendTime, which is sent from client to server, but
> looking at the code it looks like the server doesn't use it for anything. In
> 9.2, we added WalSndrMessage->sendTime, which is used by a standby server to
> calculate how far behind the standby is.
>
> I'm tempted to just change all of those TimestampTz fields to something
> that's independent of integer/float timestamp setting, in 9.2. At a quick
> glance, it seems that it wouldn't break anything.

In general, I think that would work. Since we can't replicate across
versions anyway.

Will it break using pg_basebackup 9.2 on a 9.1 server, though? that
would also be very useful in the scenario of the central server...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-02-28 09:08:25
Message-ID:	CAHGQGwHbPfABCPsuYqidw2xBU_MJJLO200EcH33SXiBWbdoyOg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 8, 2012 at 1:33 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Feb 7, 2012 at 17:29, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 07.02.2012 16:55, Tom Lane wrote:
>>>
>>> (The integer vs float TimestampTz issue is a kind of portability
>>> problem, but we've already bought into the assumption that sender and
>>> receiver must be built with the same choice, no?)
>>
>>
>> Hmm, true. In hindsight, I think that was a bad choice, but it's a bit late
>> to change that. pg_basebackup doesn't otherwise care about the integer/float
>> timestamps, but it does send a timestamp back to the server. You won't be
>> able to actually start up the database if the config options don't match,
>> but I think it would be good if pg_basebackup still worked across platforms
>> and versions. For example, you might have a central backup server that calls
>> pg_basebackup on several database servers, running on different platforms.
>>
>> In 9.0, the only field in the protocol that depends on timestamp format is
>> WalDataMessageHeader->sendTime. That goes from server to client, and
>> pg_basebackup/pg_receivexlog don't care about that. In 9.1 we introduced
>> StandbyReplyMessage->sendTime, which is sent from client to server, but
>> looking at the code it looks like the server doesn't use it for anything. In
>> 9.2, we added WalSndrMessage->sendTime, which is used by a standby server to
>> calculate how far behind the standby is.
>>
>> I'm tempted to just change all of those TimestampTz fields to something
>> that's independent of integer/float timestamp setting, in 9.2. At a quick
>> glance, it seems that it wouldn't break anything.

Agreed. If we'll have not pushed such change into 9.2, we would break
something later.

> In general, I think that would work. Since we can't replicate across
> versions anyway.
>
> Will it break using pg_basebackup 9.2 on a 9.1 server, though? that
> would also be very useful in the scenario of the central server...

No unless I'm missing something. Because pg_basebackup doesn't use
any message which is defined in walprotocol.h if "-x stream" option is
not specified.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-03-29 04:43:49
Message-ID:	CAHGQGwFToYdUP1RP9PpKW7W=y=VhffMWawpMMTX4=aPTJ9NZMg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 28, 2012 at 6:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Feb 8, 2012 at 1:33 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> Will it break using pg_basebackup 9.2 on a 9.1 server, though? that
>> would also be very useful in the scenario of the central server...
>
> No unless I'm missing something. Because pg_basebackup doesn't use
> any message which is defined in walprotocol.h if "-x stream" option is
> not specified.

No, this is not right at all :( Changing TimestampTz fields in 9.2 would break
that use case.

If we support that use case, pg_basebackup 9.2 must know which an integer
or a double is used for TimestampTz in 9.1 server. Otherwise pg_basebackup
cannot process a WAL data message proporly. But unfortunately there is no
way for pg_basebackup 9.2 to know that... 9.1 has no API to report the actual
datatype of its TimestampTz field.

One idea to support that use case is to add new command-line option into
pg_basebackup, which specifies the datatype of TimestampTz field. You can
use one pg_basebackup 9.2 executable on 9.1 server whether
--disable-integer-datetimes is specified or not. But I'm not really sure if it's
worth doing this, because ISTM that it's rare to build a server and a
client with
the different choice about TimestampTz datatype.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-10 13:04:21
Message-ID:	CABUevExPiAv5Po+F6NkfDFDF_qkv0PzvFAB+HjyTavrfAxf4JQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Argh. This thread appears to have been forgotten - sorry about that.

Given that we're taling about a potential protocol change, we really
should resolve this before we wrap beta, no?

On Thu, Mar 29, 2012 at 6:43 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Feb 28, 2012 at 6:08 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Feb 8, 2012 at 1:33 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> Will it break using pg_basebackup 9.2 on a 9.1 server, though? that
>>> would also be very useful in the scenario of the central server...
>>
>> No unless I'm missing something. Because pg_basebackup doesn't use
>> any message which is defined in walprotocol.h if "-x stream" option is
>> not specified.
>
> No, this is not right at all :( Changing TimestampTz fields in 9.2 would break
> that use case.
>
> If we support that use case, pg_basebackup 9.2 must know which an integer
> or a double is used for TimestampTz in 9.1 server. Otherwise pg_basebackup
> cannot process a WAL data message proporly. But unfortunately there is no
> way for pg_basebackup 9.2 to know that... 9.1 has no API to report the actual
> datatype of its TimestampTz field.
>
> One idea to support that use case is to add new command-line option into
> pg_basebackup, which specifies the datatype of TimestampTz field. You can
> use one pg_basebackup 9.2 executable on 9.1 server whether
> --disable-integer-datetimes is specified or not. But I'm not really sure if it's
> worth doing this, because ISTM that it's rare to build a server and a
> client with
> the different choice about TimestampTz datatype.

I think that's survivable - but what will things look like if they do
mismatch? Can we detect "abnormal values" somewhere and at least abort
in a controlled fashion saying "sorry, wrong build flags"?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-10 14:43:37
Message-ID:	CABUevEw=CDieE5eFG7PJG8ME4XLgvRmR8gLOO6PvG6O9iQHo6w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 10, 2012 at 3:04 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> Argh. This thread appears to have been forgotten - sorry about that.
>
> Given that we're taling about a potential protocol change, we really
> should resolve this before we wrap beta, no?

Had a chat with Heikki about this, and we came to the conslusion that
we don't actually have to fix it befor ebeta. Because pg_basebackup is
going to have to consider 9.1 servers anyway, and we can just treat
9.2beta1 as being a 9.1 from this perspective.

We still have to fix it, but it' snot as urgent :-)

FWIW, the main plan we're onto is still to add the GUCs on new
connections to walsender, so we have something to work with...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-10 14:51:14
Message-ID:	CABUevEyd-dgx-EjCLjwyT2Gvsx+mBBMgresVEc7mqDa3O+jEKQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 10, 2012 at 4:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Thu, May 10, 2012 at 3:04 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> Argh. This thread appears to have been forgotten - sorry about that.
>>
>> Given that we're taling about a potential protocol change, we really
>> should resolve this before we wrap beta, no?
>
> Had a chat with Heikki about this, and we came to the conslusion that
> we don't actually have to fix it befor ebeta. Because pg_basebackup is
> going to have to consider 9.1 servers anyway, and we can just treat
> 9.2beta1 as being a 9.1 from this perspective.
>
> We still have to fix it, but it' snot as urgent :-)
>
> FWIW, the main plan we're onto is still to add the GUCs on new
> connections to walsender, so we have something to work with...

And taking this a step further - we *already* send these GUCs.
Previous references to us not doing that were incorrect :-)

So this should be a much easier fix than we thought. And can be done
entirely in pg_basebackup, meaning we don't need to worry about
beta...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-10 18:29:20
Message-ID:	CAHGQGwEdHYjWehfDN=6G9g5kFn_TZ5r3C3jxjL6uLxy0+2Wpgw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 10, 2012 at 11:51 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> And taking this a step further - we *already* send these GUCs.
> Previous references to us not doing that were incorrect :-)
>
> So this should be a much easier fix than we thought. And can be done
> entirely in pg_basebackup, meaning we don't need to worry about
> beta...

Sounds good!

Regards,

--
Fujii Masao

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-11 14:43:28
Message-ID:	CABUevEyjf5q__cAOZhqe_7pwj9U5wmtKiG2Xy9ErQ-S1sXQ3qg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday, May 10, 2012, Fujii Masao wrote:

> On Thu, May 10, 2012 at 11:51 PM, Magnus Hagander <magnus(at)hagander(dot)net<javascript:;>>
> wrote:
> > And taking this a step further - we *already* send these GUCs.
> > Previous references to us not doing that were incorrect :-)
> >
> > So this should be a much easier fix than we thought. And can be done
> > entirely in pg_basebackup, meaning we don't need to worry about
> > beta...
>
> Sounds good!
>
>
Should we go down the easy way and just reject connections when the flag is
mismatching between the client and the server (trivial to do - see the
attached patch)? Or should we try to implement both floating point and
integer in pg_basebackup, and make it work in either case? (We'd still have
to reject it if pg_basebackup was compiled without support for int64 at
all, of course, but the large majority of cases will have integer
timestamps these days, but could be made to support both integer and float
for the trivial operations that pg_basebackup actually does).

How common *is* it to have a build that doesn't have integer timestamps
these days? Does any of the binary builds do that at all, for example? If
it's uncommon enough, I think we should just go with the easy way out...

//Magnus

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Attachment	Content-Type	Size
validate_int_timestamps.patch	text/x-patch	1.1 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-11 15:44:00
Message-ID:	5579.1336751040@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> How common *is* it to have a build that doesn't have integer timestamps
> these days? Does any of the binary builds do that at all, for example? If
> it's uncommon enough, I think we should just go with the easy way out...

+1 for just rejecting a mismatch.

regards, tom lane

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-11 16:13:09
Message-ID:	CAHGQGwFYAZzq5cwgKSssiHMUrGdDi7uEYgVeC1BqWL6tEaEt1Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, May 12, 2012 at 12:44 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> How common *is* it to have a build that doesn't have integer timestamps
>> these days? Does any of the binary builds do that at all, for example? If
>> it's uncommon enough, I think we should just go with the easy way out...
>
> +1 for just rejecting a mismatch.

Agreed.

Regards,

--
Fujii Masao

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-14 18:24:15
Message-ID:	CAHGQGwFrhTrWbO==HDDdEdbGdMuR1qOEu17Xbe=_HYifAKXEDQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> Should we go down the easy way and just reject connections when the flag is
> mismatching between the client and the server (trivial to do - see the
> attached patch)?

+ char *tmpparam;

You forgot to add "const" before "char", which causes a compile-time warning.

Regards,

--
Fujii Masao

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-22 14:04:25
Message-ID:	CA+TgmoYqTZhbz3Fnz8Evc23zKWbT5cLxCnHyibe2mn5zBfSdbA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, May 14, 2012 at 2:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> Should we go down the easy way and just reject connections when the flag is
>> mismatching between the client and the server (trivial to do - see the
>> attached patch)?
>
> + char *tmpparam;
>
> You forgot to add "const" before "char", which causes a compile-time warning.

I went ahead and committed this, with this fix and a slight change to
the message text.

Hope that's OK with everyone...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Magnus Hagander <magnus(at)hagander(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-23 18:11:25
Message-ID:	CAHGQGwHLeRc0bnEscA=H3z6kobicXDDhtwqGTRMAEvxFwxjtbQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, May 22, 2012 at 11:04 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, May 14, 2012 at 2:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> Should we go down the easy way and just reject connections when the flag is
>>> mismatching between the client and the server (trivial to do - see the
>>> attached patch)?
>>
>> + char *tmpparam;
>>
>> You forgot to add "const" before "char", which causes a compile-time warning.
>
> I went ahead and committed this, with this fix and a slight change to
> the message text.

Thanks!

> Hope that's OK with everyone...

What about calling PQfinish() before exit() to avoid "unexpected EOF
connection" error?
Patch attached.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
pqfinish_before_exit_v1.patch	application/octet-stream	540 bytes

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-23 19:52:55
Message-ID:	CABUevExU6dgHO08n5ZE+BRNXgQqsJc-yHTYDa0dMrZVMcMZvsQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 23, 2012 at 8:11 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, May 22, 2012 at 11:04 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, May 14, 2012 at 2:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> Should we go down the easy way and just reject connections when the flag is
>>>> mismatching between the client and the server (trivial to do - see the
>>>> attached patch)?
>>>
>>> + char *tmpparam;
>>>
>>> You forgot to add "const" before "char", which causes a compile-time warning.
>>
>> I went ahead and committed this, with this fix and a slight change to
>> the message text.
>
> Thanks!
>
>> Hope that's OK with everyone...
>
> What about calling PQfinish() before exit() to avoid "unexpected EOF
> connection" error?
> Patch attached.

Makes sense, applied.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-05-25 17:56:27
Message-ID:	CAHGQGwHwsPcDaf0SeDokGg5d05yrFqwdJ2bRTCwDe3ATyebpDQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 24, 2012 at 4:52 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Wed, May 23, 2012 at 8:11 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, May 22, 2012 at 11:04 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Mon, May 14, 2012 at 2:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>> On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>> Should we go down the easy way and just reject connections when the flag is
>>>>> mismatching between the client and the server (trivial to do - see the
>>>>> attached patch)?
>>>>
>>>> + char *tmpparam;
>>>>
>>>> You forgot to add "const" before "char", which causes a compile-time warning.
>>>
>>> I went ahead and committed this, with this fix and a slight change to
>>> the message text.
>>
>> Thanks!
>>
>>> Hope that's OK with everyone...
>>
>> What about calling PQfinish() before exit() to avoid "unexpected EOF
>> connection" error?
>> Patch attached.
>
> Makes sense, applied.

Thanks! So, let's go back to the original problem: pg_receivexlog
still doesn't work fine
under --disable-integer-datetimes. I previously posted the patch which
fixes that problem.
http://archives.postgresql.org/message-id/CAHGQGwFutqnFPBYcHUCuoy1zMVDXto=o4OgsjrBWxW4zj2TCSw@mail.gmail.com

Attached is the updated version of the patch. Comments?

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
timeout_handling_v3.patch	application/octet-stream	9.9 KB

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-06-05 08:32:02
Message-ID:	CABUevEzAK48jaTQ_kLGekjvVapoaveCZke7Ub72qaQKh3pgJmQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, May 25, 2012 at 7:56 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, May 24, 2012 at 4:52 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Wed, May 23, 2012 at 8:11 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Tue, May 22, 2012 at 11:04 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> On Mon, May 14, 2012 at 2:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>> On Fri, May 11, 2012 at 11:43 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>>>> Should we go down the easy way and just reject connections when the flag is
>>>>>> mismatching between the client and the server (trivial to do - see the
>>>>>> attached patch)?
>>>>>
>>>>> + char *tmpparam;
>>>>>
>>>>> You forgot to add "const" before "char", which causes a compile-time warning.
>>>>
>>>> I went ahead and committed this, with this fix and a slight change to
>>>> the message text.
>>>
>>> Thanks!
>>>
>>>> Hope that's OK with everyone...
>>>
>>> What about calling PQfinish() before exit() to avoid "unexpected EOF
>>> connection" error?
>>> Patch attached.
>>
>> Makes sense, applied.
>
> Thanks! So, let's go back to the original problem: pg_receivexlog
> still doesn't work fine
> under --disable-integer-datetimes. I previously posted the patch which
> fixes that problem.
> http://archives.postgresql.org/message-id/CAHGQGwFutqnFPBYcHUCuoy1zMVDXto=o4OgsjrBWxW4zj2TCSw@mail.gmail.com
>
> Attached is the updated version of the patch. Comments?

It contains a number of unrelated changes of %m -> %s - what's the
motivation for those?

You also removed the "safeguard" of always sleeping at least 1 second
- should we keep some level of safeguard there, even if it's not in
full seconds anymore?

Is the -1 sent into localTimestampDifference still relevent at all?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-06-05 13:36:18
Message-ID:	CAHGQGwHPBcsNk1Xr5ziruisws2eD3PvzaLufF2HMdJaGpH8OPg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 5, 2012 at 5:32 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> It contains a number of unrelated changes of %m -> %s - what's the
> motivation for those?

%m in fprintf() is glibc extension according to man page, so it's not portable
and should not be used, I think.

We discussed this before and reached consensus not to use %m :)
http://archives.postgresql.org/pgsql-hackers/2011-01/msg01674.php

> You also removed the "safeguard" of always sleeping at least 1 second
> - should we keep some level of safeguard there, even if it's not in
> full seconds anymore?
>
> Is the -1 sent into localTimestampDifference still relevent at all?

No because that "safeguard" would mess up with a user who sets
replication_timeout to less than one second. Though I'm not sure
whether there is really any user who wants such too short timeout....

Regards,

--
Fujii Masao

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-06-05 13:39:03
Message-ID:	CABUevEyfDg+PXeD=t98X07bQGgF9FT8trtLL9qAQ82Ey3wBGBA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 5, 2012 at 3:36 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jun 5, 2012 at 5:32 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> It contains a number of unrelated changes of %m -> %s - what's the
>> motivation for those?
>
> %m in fprintf() is glibc extension according to man page, so it's not portable
> and should not be used, I think.
>
> We discussed this before and reached consensus not to use %m :)
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01674.php

:-) there goes my memory.

That said, we're using %m in a fairly large number of places already,
but they're mostly in the backend. I guess we're safe there.

Anyway, +1 for making that change then, but I'll make it as a separate patch.

>> You also removed the "safeguard" of always sleeping at least 1 second
>> - should we keep some level of safeguard there, even if it's not in
>> full seconds anymore?
>>
>> Is the -1 sent into localTimestampDifference still relevent at all?
>
> No because that "safeguard" would mess up with a user who sets
> replication_timeout to less than one second. Though I'm not sure
> whether there is really any user who wants such too short timeout....

Right - I meant we might want to adjust the safeguad. Assuming <1 sec
is reasonable, maybe cap it at 100ms or so?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-06-05 13:47:06
Message-ID:	29990.1338904026@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Tue, Jun 5, 2012 at 3:36 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> We discussed this before and reached consensus not to use %m :)
>> http://archives.postgresql.org/pgsql-hackers/2011-01/msg01674.php

> :-) there goes my memory.

> That said, we're using %m in a fairly large number of places already,
> but they're mostly in the backend. I guess we're safe there.

It should only appear in elog/ereport calls; if there are any in bare
printfs, they are wrong, just as Fujii-san says. Frontend or backend
doesn't matter.

regards, tom lane

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-06-05 14:28:39
Message-ID:	CAHGQGwEaWuAX42uDXGk305NKsRtHXGLxvU_WaiDeRrfFAjy9iw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 5, 2012 at 10:39 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> On Tue, Jun 5, 2012 at 3:36 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Tue, Jun 5, 2012 at 5:32 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> You also removed the "safeguard" of always sleeping at least 1 second
>>> - should we keep some level of safeguard there, even if it's not in
>>> full seconds anymore?
>>>
>>> Is the -1 sent into localTimestampDifference still relevent at all?
>>
>> No because that "safeguard" would mess up with a user who sets
>> replication_timeout to less than one second. Though I'm not sure
>> whether there is really any user who wants such too short timeout....
>
> Right - I meant we might want to adjust the safeguad. Assuming <1 sec
> is reasonable, maybe cap it at 100ms or so?

On second thought, the status packet interval doesn't need to be given
in milliseconds at all. As I said, which would mess up with a user who sets
replication_timeout to less than 1 sec. But since wal_receiver_status_interval
is given in seconds, we've already messed up with them even if we've
changed pg_receivexlog so that its status interval can be given in
milliseconds.

We received no complaints about wal_receiver_status_interval so far, so
I think there is still no need to allow pg_receivexlog --statusint to be set
to less than 1 sec. Thought?

Regards,

--
Fujii Masao

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-06-05 14:42:31
Message-ID:	CABUevEzKiEEWnpm4VXEOW805MRxxY-OV058_AvM8vwaKrrTETA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 5, 2012 at 4:28 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jun 5, 2012 at 10:39 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> On Tue, Jun 5, 2012 at 3:36 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Tue, Jun 5, 2012 at 5:32 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> You also removed the "safeguard" of always sleeping at least 1 second
>>>> - should we keep some level of safeguard there, even if it's not in
>>>> full seconds anymore?
>>>>
>>>> Is the -1 sent into localTimestampDifference still relevent at all?
>>>
>>> No because that "safeguard" would mess up with a user who sets
>>> replication_timeout to less than one second. Though I'm not sure
>>> whether there is really any user who wants such too short timeout....
>>
>> Right - I meant we might want to adjust the safeguad. Assuming <1 sec
>> is reasonable, maybe cap it at 100ms or so?
>
> On second thought, the status packet interval doesn't need to be given
> in milliseconds at all. As I said, which would mess up with a user who sets
> replication_timeout to less than 1 sec. But since wal_receiver_status_interval
> is given in seconds, we've already messed up with them even if we've
> changed pg_receivexlog so that its status interval can be given in
> milliseconds.
>
> We received no complaints about wal_receiver_status_interval so far, so
> I think there is still no need to allow pg_receivexlog --statusint to be set
> to less than 1 sec. Thought?

Works for me. We still need a (reworked) patch, though, right? We just
move where the move between seconds and milliseconds happens?

I definitely don't think we need subsecond granularity in the user
facing number. Even a second is pretty short. (We do need to retain
the ability to set it to 0 = off of course).

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-06-06 18:10:18
Message-ID:	CAHGQGwEbYM+9YPGq7ZwuEq8euRGubWs4WCujLsAWeUYkA4d_ng@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 5, 2012 at 11:42 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> Works for me. We still need a (reworked) patch, though, right? We just
> move where the move between seconds and milliseconds happens?

Attached is the updated version of the patch.

> I definitely don't think we need subsecond granularity in the user
> facing number. Even a second is pretty short.

Yep.

> (We do need to retain the ability to set it to 0 = off of course).

Yep, a value of zero disables the status updates, and the patch adds
that explanation into the document of pg_basebackup and pg_receivexlog.

Regards,

--
Fujii Masao

Attachment	Content-Type	Size
timeout_handling_v4.patch	application/octet-stream	7.5 KB

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: incorrect handling of the timeout in pg_receivexlog
Date:	2012-06-10 10:23:31
Message-ID:	CABUevEwp5EcjLz_S-ZGhLGo+jEVjZLE4hOxT_yQ331FKTVawbw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jun 6, 2012 at 8:10 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Jun 5, 2012 at 11:42 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> Works for me. We still need a (reworked) patch, though, right? We just
>> move where the move between seconds and milliseconds happens?
>
> Attached is the updated version of the patch.

Thanks.

>> I definitely don't think we need subsecond granularity in the user
>> facing number. Even a second is pretty short.
>
> Yep.
>
>> (We do need to retain the ability to set it to 0 = off of course).
>
> Yep, a value of zero disables the status updates, and the patch adds
> that explanation into the document of pg_basebackup and pg_receivexlog.

Applied, with some small modifications. For example, you don't need a
frontend-specific copy of #define's that are in the backend, since
those don't require linking to the backend, just the #include.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/