Streaming replication on win32, still broken

Lists: pgsql-hackers
From: Magnus Hagander <magnus(at)hagander(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Streaming replication on win32, still broken
Date: 2010-02-15 15:37:36
Message-ID: 9837222c1002150737h4d3616edxc03e45c6ac278a6c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

With the libpq fixes, I get further (more on that fix later, btw), but
now I get stuck in this. When I do something on the master that
generates WAL, such as insert a record, and then try to query this on
the slave, the walreceiver process crashes with:

PANIC: XX000: could not write to log file 0, segment 9 at offset 0, length 160:
Invalid argument
LOCATION: XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487

I'll keep digging at the details, but if somebody has a good idea here.. ;)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-16 09:56:16
Message-ID: 3f0b79eb1002160156h232fb56ka29d69c668fa80a2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> With the libpq fixes, I get further (more on that fix later, btw), but
> now I get stuck in this. When I do something on the master that
> generates WAL, such as insert a record, and then try to query this on
> the slave, the walreceiver process crashes with:
>
> PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, length 160:
>  Invalid argument
> LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487
>
> I'll keep digging at the details, but if somebody has a good idea here.. ;)

Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too.
Though I've not idenfied the cause yet, I guess that it derives from wrong use
of the type of local variables in XLogWalRcvWrite(). I'll continue investigation
of it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-16 10:20:31
Message-ID: 9837222c1002160220l2ccec5aaqfccb55748c1c4892@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/2/16 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>:
> On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> With the libpq fixes, I get further (more on that fix later, btw), but
>> now I get stuck in this. When I do something on the master that
>> generates WAL, such as insert a record, and then try to query this on
>> the slave, the walreceiver process crashes with:
>>
>> PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, length 160:
>>  Invalid argument
>> LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487
>>
>> I'll keep digging at the details, but if somebody has a good idea here.. ;)
>
> Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too.
> Though I've not idenfied the cause yet, I guess that it derives from wrong use
> of the type of local variables in XLogWalRcvWrite(). I'll continue investigation
> of it.

Thanks!

I will be somewhat spottily available over the next two days due to
on-site work with clients.

Let me know if you would be helped by some details of how to get a
(somewhat faster) EC2 image up and running with MSVC to test on :-)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-16 12:40:11
Message-ID: 3f0b79eb1002160440md6edd30t6953398077b00bc5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 16, 2010 at 7:20 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> 2010/2/16 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>:
>> On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> With the libpq fixes, I get further (more on that fix later, btw), but
>>> now I get stuck in this. When I do something on the master that
>>> generates WAL, such as insert a record, and then try to query this on
>>> the slave, the walreceiver process crashes with:
>>>
>>> PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, length 160:
>>>  Invalid argument
>>> LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487
>>>
>>> I'll keep digging at the details, but if somebody has a good idea here.. ;)
>>
>> Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too.
>> Though I've not idenfied the cause yet, I guess that it derives from wrong use
>> of the type of local variables in XLogWalRcvWrite(). I'll continue investigation
>> of it.
>
> Thanks!
>
> I will be somewhat spottily available over the next two days due to
> on-site work with clients.
>
> Let me know if you would be helped by some details of how to get a
> (somewhat faster) EC2 image up and running with MSVC to test on :-)

Thanks! I can probably use the EC2 image by reading your great blog post.
http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html

But it might take some time to make my sysadmin open the port for
rdesktop for some reasons...

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-16 21:28:27
Message-ID: 9837222c1002161328m4b2a7c49m6d6c9b24f141cd49@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/2/16 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>:
> On Tue, Feb 16, 2010 at 7:20 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> 2010/2/16 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>:
>>> On Tue, Feb 16, 2010 at 12:37 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> With the libpq fixes, I get further (more on that fix later, btw), but
>>>> now I get stuck in this. When I do something on the master that
>>>> generates WAL, such as insert a record, and then try to query this on
>>>> the slave, the walreceiver process crashes with:
>>>>
>>>> PANIC:  XX000: could not write to log file 0, segment 9 at offset 0, length 160:
>>>>  Invalid argument
>>>> LOCATION:  XLogWalRcvWrite, .\src\backend\replication\walreceiver.c:487
>>>>
>>>> I'll keep digging at the details, but if somebody has a good idea here.. ;)
>>>
>>> Yeah, this problem was reproduced in my (very slow :-( ) MinGW environment, too.
>>> Though I've not idenfied the cause yet, I guess that it derives from wrong use
>>> of the type of local variables in XLogWalRcvWrite(). I'll continue investigation
>>> of it.
>>
>> Thanks!
>>
>> I will be somewhat spottily available over the next two days due to
>> on-site work with clients.
>>
>> Let me know if you would be helped by some details of how to get a
>> (somewhat faster) EC2 image up and running with MSVC to test on :-)
>
> Thanks! I can probably use the EC2 image by reading your great blog post.
> http://blog.hagander.net/archives/151-Testing-PostgreSQL-patches-on-Windows-using-Amazon-EC2.html

Actually, that one deosn't work anymore, because I managed to break
the image :-)

If you send me your amazon id, I can get you premissions on my private
image. I plan to clean it up and make it public, just haven't gotten
around to it yet...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-17 05:55:41
Message-ID: 3f0b79eb1002162155u2f2f81f7w3281c894d8ae2c63@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 17, 2010 at 6:28 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> If you send me your amazon id, I can get you premissions on my private
> image. I plan to clean it up and make it public, just haven't gotten
> around to it yet...

Thanks for your concern! I'll send the ID when I complete the preparation.

And, fortunately?, when I set wal_sync_method to open_sync, the problem was
reproduced in the linux, too. The cause is that the data that is written by
walreceiver is not aligned, even if O_DIRECT is used. On win32, O_DIRECT is
used by default. So the problem always happened on win32.

I propose two solution ideas:

1. O_DIRECT is somewhat harmful in the standby since the data written by
walreceiver is read by the startup process immediately. So, how about
not making only walreceiver use O_DIRECT?

2. Straightforwardly observe the alignment rule. Since the received WAL
data might start at the middle of WAL block, walreceiver needs to keep
the last half-written WAL block for alignment. OTOH since the received
data might end at the middle of WAL block, walreceiver needs zero-padding.
As a result, walreceiver writes the set of the last WAL block, received
data and zero-padding.

Which is better? Or do you have another better idea?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-17 06:03:44
Message-ID: 9837222c1002162203m4ef4139clc1d7b43c90e19948@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 17, 2010 at 06:55, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Feb 17, 2010 at 6:28 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> If you send me your amazon id, I can get you premissions on my private
>> image. I plan to clean it up and make it public, just haven't gotten
>> around to it yet...
>
> Thanks for your concern! I'll send the ID when I complete the preparation.

ok.

> And, fortunately?, when I set wal_sync_method to open_sync, the problem was
> reproduced in the linux, too. The cause is that the data that is written by

Ah, that's good. It always helps if it's a cross-platform issue -
particularly in that it's not one of the funky win32 specific things
we did :)

> walreceiver is not aligned, even if O_DIRECT is used. On win32, O_DIRECT is
> used by default. So the problem always happened on win32.

Ahh. I see.

> I propose two solution ideas:
>
> 1. O_DIRECT is somewhat harmful in the standby since the data written by
>   walreceiver is read by the startup process immediately. So, how about
>   not making only walreceiver use O_DIRECT?

In that case, O_DIRECT would be counterproductive, no? It maps to
FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
cache. So the read in the startup proc is actually guaranteed to
reuqire a physical read - of something we just wrote, so it'll almost
certainly end up waiting for a rotation, no?

Seems like getting rid of O_DIRECT here is the right thing to do,
regardless of this.

> 2. Straightforwardly observe the alignment rule. Since the received WAL
>   data might start at the middle of WAL block, walreceiver needs to keep
>   the last half-written WAL block for alignment. OTOH since the received
>   data might end at the middle of WAL block, walreceiver needs zero-padding.
>   As a result, walreceiver writes the set of the last WAL block, received
>   data and zero-padding.

May there be other reasons to d this as well?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-17 06:27:17
Message-ID: 20027.1266388037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Wed, Feb 17, 2010 at 06:55, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> 2. Straightforwardly observe the alignment rule. Since the received WAL
>> data might start at the middle of WAL block, walreceiver needs to keep
>> the last half-written WAL block for alignment. OTOH since the received
>> data might end at the middle of WAL block, walreceiver needs zero-padding.
>> As a result, walreceiver writes the set of the last WAL block, received
>> data and zero-padding.

> May there be other reasons to d this as well?

Writing misaligned data is certain to be expensive even when it works...

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-17 07:07:01
Message-ID: 3f0b79eb1002162307j7169fefdv5649420cc653335@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> In that case, O_DIRECT would be counterproductive, no? It maps to
> FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
> cache. So the read in the startup proc is actually guaranteed to
> reuqire a physical read - of something we just wrote, so it'll almost
> certainly end up waiting for a rotation, no?
>
> Seems like getting rid of O_DIRECT here is the right thing to do,
> regardless of this.

Agreed. I'll remove O_DIRECT from walreceiver.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-17 07:07:22
Message-ID: 3f0b79eb1002162307g552cd8ecv43282ba814c87246@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 17, 2010 at 3:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Wed, Feb 17, 2010 at 06:55, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> 2. Straightforwardly observe the alignment rule. Since the received WAL
>>>   data might start at the middle of WAL block, walreceiver needs to keep
>>>   the last half-written WAL block for alignment. OTOH since the received
>>>   data might end at the middle of WAL block, walreceiver needs zero-padding.
>>>   As a result, walreceiver writes the set of the last WAL block, received
>>>   data and zero-padding.
>
>> May there be other reasons to d this as well?
>
> Writing misaligned data is certain to be expensive even when it works...

Yeah, right. After I remove O_DIRECT, I'll change walreceiver so as to
do an alignment correctly, and then I'll test the performance.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-17 09:00:59
Message-ID: 3f0b79eb1002170100v6d2b36b6ifa430318717f6f09@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> In that case, O_DIRECT would be counterproductive, no? It maps to
>> FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
>> cache. So the read in the startup proc is actually guaranteed to
>> reuqire a physical read - of something we just wrote, so it'll almost
>> certainly end up waiting for a rotation, no?
>>
>> Seems like getting rid of O_DIRECT here is the right thing to do,
>> regardless of this.
>
> Agreed. I'll remove O_DIRECT from walreceiver.

Here is the patch to do that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
remove_o_direct_from_walrcv_0217.patch text/x-patch 5.5 KB

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-17 10:27:38
Message-ID: 3f0b79eb1002170227i63ebcd9fp648c2794fcb1efda@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 17, 2010 at 6:00 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> In that case, O_DIRECT would be counterproductive, no? It maps to
>>> FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
>>> cache. So the read in the startup proc is actually guaranteed to
>>> reuqire a physical read - of something we just wrote, so it'll almost
>>> certainly end up waiting for a rotation, no?
>>>
>>> Seems like getting rid of O_DIRECT here is the right thing to do,
>>> regardless of this.
>>
>> Agreed. I'll remove O_DIRECT from walreceiver.
>
> Here is the patch to do that.

Ooops! I found the bug in the patch. Here is the updated version.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
remove_o_direct_from_walrcv_0217_v2.patch text/x-diff 5.5 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-17 20:28:59
Message-ID: 4B7C518B.3010305@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Wed, Feb 17, 2010 at 6:00 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Wed, Feb 17, 2010 at 4:07 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> On Wed, Feb 17, 2010 at 3:03 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>>> In that case, O_DIRECT would be counterproductive, no? It maps to
>>>> FILE_FLAG_NOI_BUFFERING, which makes sure it doesn't go into the
>>>> cache. So the read in the startup proc is actually guaranteed to
>>>> reuqire a physical read - of something we just wrote, so it'll almost
>>>> certainly end up waiting for a rotation, no?
>>>>
>>>> Seems like getting rid of O_DIRECT here is the right thing to do,
>>>> regardless of this.
>>> Agreed. I'll remove O_DIRECT from walreceiver.
>> Here is the patch to do that.
>
> Ooops! I found the bug in the patch. Here is the updated version.

If I'm reading the patch correctly, when wal_sync_method is 'open_sync',
walreceiver nevertheless opens the WAL file without the O_DIRECT flag.
When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(),
issue_xlog_fsync() will do nothing because it assumes the write() synced
it already. So the data written isn't being forced to disk at all.

How about just forcing sync_method to 'fsync' in walreceiver?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-18 01:40:31
Message-ID: 3f0b79eb1002171740y2379a16aw4e25852bb9c3c87c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> If I'm reading the patch correctly, when wal_sync_method is 'open_sync',
> walreceiver nevertheless opens the WAL file without the O_DIRECT flag.
> When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(),
> issue_xlog_fsync() will do nothing because it assumes the write() synced
> it already. So the data written isn't being forced to disk at all.

When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC
flag. So I think that write() flushes the data to disk even if O_DIRECT
flag is not given. Am I missing something?

> How about just forcing sync_method to 'fsync' in walreceiver?

In win32, O_DSYNC seems to be preferred to 'fsync' so far. So I'm not sure
if reshuffling of priority is harmless.
http://archives.postgresql.org/pgsql-hackers-win32/2005-03/msg00148.php

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-18 07:38:15
Message-ID: 4B7CEE67.8010002@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> If I'm reading the patch correctly, when wal_sync_method is 'open_sync',
>> walreceiver nevertheless opens the WAL file without the O_DIRECT flag.
>> When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(),
>> issue_xlog_fsync() will do nothing because it assumes the write() synced
>> it already. So the data written isn't being forced to disk at all.
>
> When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC
> flag. So I think that write() flushes the data to disk even if O_DIRECT
> flag is not given. Am I missing something?

Ah, ok, you're right.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-18 10:01:12
Message-ID: 9837222c1002180201r2e4626fra7620500e22e193f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/2/18 Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>:
> Fujii Masao wrote:
>> On Thu, Feb 18, 2010 at 5:28 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> If I'm reading the patch correctly, when wal_sync_method is 'open_sync',
>>> walreceiver nevertheless opens the WAL file without the O_DIRECT flag.
>>> When it later flushes it in XLogWalRcvFlush() by issue_xlog_fsync(),
>>> issue_xlog_fsync() will do nothing because it assumes the write() synced
>>> it already. So the data written isn't being forced to disk at all.
>>
>> When 'open_sync' is chosen, the WAL file is opened with O_SYNC or O_FSYNC
>> flag. So I think that write() flushes the data to disk even if O_DIRECT
>> flag is not given. Am I missing something?
>
> Ah, ok, you're right.

Yes, I believe the difference is that with O_DIRECT it bypasses the
cache completely. Without it, we still sync it out, but it also goes
into the cache.

O_DIRECT helps us when we're not going to read the file again, because
we don't waste cache on it. If we are, which is the case here, it
should be really bad for performance, since we actually have to do a
physical read.

Incidentally, that should also apply to general WAL when archive_mdoe
is on. Do we optimize for that?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-18 10:04:56
Message-ID: 4B7D10C8.5000301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> O_DIRECT helps us when we're not going to read the file again, because
> we don't waste cache on it. If we are, which is the case here, it
> should be really bad for performance, since we actually have to do a
> physical read.
>
> Incidentally, that should also apply to general WAL when archive_mdoe
> is on. Do we optimize for that?

Hmm, no we don't. We do take that into account so that we refrain from
issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't
disable O_DIRECT. Maybe we should..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-18 10:39:26
Message-ID: 3f0b79eb1002180239g2443e3b3qd8657c2662415c42@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Magnus Hagander wrote:
>> O_DIRECT helps us when we're not going to read the file again, because
>> we don't waste cache on it. If we are, which is the case here, it
>> should be really bad for performance, since we actually have to do a
>> physical read.
>>
>> Incidentally, that should also apply to general WAL when archive_mdoe
>> is on. Do we optimize for that?
>
> Hmm, no we don't. We do take that into account so that we refrain from
> issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't
> disable O_DIRECT. Maybe we should..

Since the performance of WAL write is more important than that of WAL
archiving in general, that optimization might offer little benefit.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-18 11:14:50
Message-ID: 9837222c1002180314g56e82fd0g472845ed28b6deec@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/2/18 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>:
> On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Magnus Hagander wrote:
>>> O_DIRECT helps us when we're not going to read the file again, because
>>> we don't waste cache on it. If we are, which is the case here, it
>>> should be really bad for performance, since we actually have to do a
>>> physical read.
>>>
>>> Incidentally, that should also apply to general WAL when archive_mdoe
>>> is on. Do we optimize for that?
>>
>> Hmm, no we don't. We do take that into account so that we refrain from
>> issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't
>> disable O_DIRECT. Maybe we should..
>
> Since the performance of WAL write is more important than that of WAL
> archiving in general, that optimization might offer little benefit.

Well, it's going to make the process that reads the WAL cause actual
physical I/O... That'll take a chunk out of your total available I/O,
which is likely to push you to the limit of your I/O capacity much
quicker.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-18 12:46:46
Message-ID: 4B7D36B6.7030903@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> 2010/2/18 Fujii Masao <masao(dot)fujii(at)gmail(dot)com>:
>> On Thu, Feb 18, 2010 at 7:04 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Magnus Hagander wrote:
>>>> O_DIRECT helps us when we're not going to read the file again, because
>>>> we don't waste cache on it. If we are, which is the case here, it
>>>> should be really bad for performance, since we actually have to do a
>>>> physical read.
>>>>
>>>> Incidentally, that should also apply to general WAL when archive_mdoe
>>>> is on. Do we optimize for that?
>>> Hmm, no we don't. We do take that into account so that we refrain from
>>> issuing posix_fadvice(DONTNEED) if archive_mode is on, but we don't
>>> disable O_DIRECT. Maybe we should..
>> Since the performance of WAL write is more important than that of WAL
>> archiving in general, that optimization might offer little benefit.
>
> Well, it's going to make the process that reads the WAL cause actual
> physical I/O... That'll take a chunk out of your total available I/O,
> which is likely to push you to the limit of your I/O capacity much
> quicker.

Right, doesn't seem sensible, though it would be nice to see a benchmark
on that.

Here's a patch to disable O_DIRECT when archiving or streaming is
enabled. This is pretty hard to test, so any extra eyeballs would be nice..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
no-O_DIRECT-with-archiving-1.patch text/x-diff 15.3 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-19 10:54:50
Message-ID: 4B7E6DFA.9090200@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Magnus Hagander wrote:
>> Well, it's going to make the process that reads the WAL cause actual
>> physical I/O... That'll take a chunk out of your total available I/O,
>> which is likely to push you to the limit of your I/O capacity much
>> quicker.
>
> Right, doesn't seem sensible, though it would be nice to see a benchmark
> on that.
>
> Here's a patch to disable O_DIRECT when archiving or streaming is
> enabled. This is pretty hard to test, so any extra eyeballs would be nice..

Committed. Can you check that this fixed the PANIC you saw?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication on win32, still broken
Date: 2010-02-22 04:47:03
Message-ID: 3f0b79eb1002212047l12c78dffo1bacaafed0e8668c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 19, 2010 at 7:54 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Heikki Linnakangas wrote:
>> Magnus Hagander wrote:
>>> Well, it's going to make the process that reads the WAL cause actual
>>> physical I/O... That'll take a chunk out of your total available I/O,
>>> which is likely to push you to the limit of your I/O capacity much
>>> quicker.
>>
>> Right, doesn't seem sensible, though it would be nice to see a benchmark
>> on that.
>>
>> Here's a patch to disable O_DIRECT when archiving or streaming is
>> enabled. This is pretty hard to test, so any extra eyeballs would be nice..
>
> Committed. Can you check that this fixed the PANIC you saw?

Thanks! Yeah, SR works fine in my MinGW environment.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center