Lists: | pgsql-hackers |
---|
From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Walreceiver fsyncs excessively |
Date: | 2011-01-13 12:01:41 |
Message-ID: | 4D2EE9A5.5080202@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
While testing Fujii-san's patch to flush any already-written WAL on
error in walreceiver, I added a debugging elog to XLogWalRcvFlush() to
print out how far it has written and flushed.
I saw an unexpected pattern while the standby catches up with the master:
LOG: streaming replication successfully connected to primary
LOG: flushing flush=0/0 write=0/1E020000
LOG: flushing flush=0/1E020000 write=0/1E040000
LOG: flushing flush=0/1E040000 write=0/1E060000
LOG: flushing flush=0/1E060000 write=0/1E080000
LOG: flushing flush=0/1E080000 write=0/1E0A0000
LOG: flushing flush=0/1E0A0000 write=0/1E0C0000
LOG: flushing flush=0/1E0C0000 write=0/1E0E0000
LOG: flushing flush=0/1E0E0000 write=0/1E100000
LOG: flushing flush=0/1E100000 write=0/1E120000
LOG: flushing flush=0/1E120000 write=0/1E140000
The master sends the WAL at full-speed, but walreceiver always fsyncs it
in 128 kB chunks. That's excessive, it should be able to read and write
to disk the whole WAL segment before flushing.
There's a little flaw in the walreceiver logic that tries to read all
the available WAL before flushing and sleeping. The way libpqrcv_receive
is written, when it's called with timeout==0 it will not call
PQconsumeInput. So what happens is that when walreceiver main loop calls
libpqrcv_receive() in a loop to fetch all WAL that's available without
blocking, it actually only reads the WAL that's in the libpq receive
buffer - it will not read the WAL that's in the TCP read buffer.
Attached patch fixes libpqrcv_receive() so that it calls
PQconsumeInput() before concluding that there's no data available. The
excessive fsyncing can lead to very bad performance, so this needs to be
appled to 9.0 too.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
Attachment | Content-Type | Size |
---|---|---|
fix-excessive-walreceiver-fsyncing.patch | text/x-diff | 1.8 KB |
From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Walreceiver fsyncs excessively |
Date: | 2011-01-13 12:34:39 |
Message-ID: | AANLkTi=Huh7Eg=+MadSuRPeatJ_KHEJ-_Sc_8_DxvOue@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Thu, Jan 13, 2011 at 9:01 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> While testing Fujii-san's patch to flush any already-written WAL on error in
> walreceiver, I added a debugging elog to XLogWalRcvFlush() to print out how
> far it has written and flushed.
>
> I saw an unexpected pattern while the standby catches up with the master:
>
> LOG: streaming replication successfully connected to primary
> LOG: flushing flush=0/0 write=0/1E020000
> LOG: flushing flush=0/1E020000 write=0/1E040000
> LOG: flushing flush=0/1E040000 write=0/1E060000
> LOG: flushing flush=0/1E060000 write=0/1E080000
> LOG: flushing flush=0/1E080000 write=0/1E0A0000
> LOG: flushing flush=0/1E0A0000 write=0/1E0C0000
> LOG: flushing flush=0/1E0C0000 write=0/1E0E0000
> LOG: flushing flush=0/1E0E0000 write=0/1E100000
> LOG: flushing flush=0/1E100000 write=0/1E120000
> LOG: flushing flush=0/1E120000 write=0/1E140000
>
> The master sends the WAL at full-speed, but walreceiver always fsyncs it in
> 128 kB chunks. That's excessive, it should be able to read and write to disk
> the whole WAL segment before flushing.
>
> There's a little flaw in the walreceiver logic that tries to read all the
> available WAL before flushing and sleeping. The way libpqrcv_receive is
> written, when it's called with timeout==0 it will not call PQconsumeInput.
> So what happens is that when walreceiver main loop calls libpqrcv_receive()
> in a loop to fetch all WAL that's available without blocking, it actually
> only reads the WAL that's in the libpq receive buffer - it will not read the
> WAL that's in the TCP read buffer.
>
> Attached patch fixes libpqrcv_receive() so that it calls PQconsumeInput()
> before concluding that there's no data available. The excessive fsyncing can
> lead to very bad performance, so this needs to be appled to 9.0 too.
Seems good.
Can we remove the "justconnected" flag, thanks to the patch?
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Walreceiver fsyncs excessively |
Date: | 2011-01-13 16:31:11 |
Message-ID: | 4D2F28CF.7040607@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 13.01.2011 14:34, Fujii Masao wrote:
> On Thu, Jan 13, 2011 at 9:01 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Attached patch fixes libpqrcv_receive() so that it calls PQconsumeInput()
>> before concluding that there's no data available. The excessive fsyncing can
>> lead to very bad performance, so this needs to be appled to 9.0 too.
>
> Seems good.
>
> Can we remove the "justconnected" flag, thanks to the patch?
Yes, good point. Committed with "justconnected" removed.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com