Re: Reducing walreceiver latency with a latch

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Reducing walreceiver latency with a latch
Date: 2010-09-13 11:40:15
Message-ID: 4C8E0D9F.9090601@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Now that we have the wonderful latch facility, let's use it to reduce
the delay between receiving a piece of WAL and applying in the standby.
Currently, the startup process polls every 100ms to see if new WAL has
arrived, which adds an average a 50 ms delay between a transaction
commit in the master and it appearing as committed in a hot standby
server. The latch patch eliminated a similar polling delay in walsender
already, the attached patch does the same for walreceiver.

After this patch, there is no unnecessary delays in the streaming
replication code path. Note that this is all still asynchronous, just
with reduced latency.

This is pretty straightforward, but any comments?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
walreceiver-latch-1.patch text/x-diff 4.9 KB

From: Thom Brown <thom(at)linux(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing walreceiver latency with a latch
Date: 2010-09-13 11:47:35
Message-ID: AANLkTikAZUAmwRvPD90fg4Q_e0zBfMTRWkcQH5hQmBQH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13 September 2010 12:40, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Now that we have the wonderful latch facility, let's use it to reduce the
> delay between receiving a piece of WAL and applying in the standby.
> Currently, the startup process polls every 100ms to see if new WAL has
> arrived, which adds an average a 50 ms delay between a transaction commit in
> the master and it appearing as committed in a hot standby server. The latch
> patch eliminated a similar polling delay in walsender already, the attached
> patch does the same for walreceiver.
>
> After this patch, there is no unnecessary delays in the streaming
> replication code path. Note that this is all still asynchronous, just with
> reduced latency.
>
> This is pretty straightforward, but any comments?

Is that supposed to be waiting 5000ms?

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: Thom Brown <thom(at)linux(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing walreceiver latency with a latch
Date: 2010-09-13 11:52:42
Message-ID: AANLkTimEVgkoYyR+cwjV9rbo81wc=kA9khps7_848j1Y@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13 September 2010 12:47, Thom Brown <thom(at)linux(dot)com> wrote:
> On 13 September 2010 12:40, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Now that we have the wonderful latch facility, let's use it to reduce the
>> delay between receiving a piece of WAL and applying in the standby.
>> Currently, the startup process polls every 100ms to see if new WAL has
>> arrived, which adds an average a 50 ms delay between a transaction commit in
>> the master and it appearing as committed in a hot standby server. The latch
>> patch eliminated a similar polling delay in walsender already, the attached
>> patch does the same for walreceiver.
>>
>> After this patch, there is no unnecessary delays in the streaming
>> replication code path. Note that this is all still asynchronous, just with
>> reduced latency.
>>
>> This is pretty straightforward, but any comments?
>
> Is that supposed to be waiting 5000ms?

Ignore me, I can see that it's right.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing walreceiver latency with a latch
Date: 2010-09-13 11:54:09
Message-ID: 4C8E10E1.2070104@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13/09/10 14:47, Thom Brown wrote:
> On 13 September 2010 12:40, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Now that we have the wonderful latch facility, let's use it to reduce the
>> delay between receiving a piece of WAL and applying in the standby.
>> Currently, the startup process polls every 100ms to see if new WAL has
>> arrived, which adds an average a 50 ms delay between a transaction commit in
>> the master and it appearing as committed in a hot standby server. The latch
>> patch eliminated a similar polling delay in walsender already, the attached
>> patch does the same for walreceiver.
>>
>> After this patch, there is no unnecessary delays in the streaming
>> replication code path. Note that this is all still asynchronous, just with
>> reduced latency.
>>
>> This is pretty straightforward, but any comments?
>
> Is that supposed to be waiting 5000ms?

Yes, it gets interrupted as soon as WAL arrives, that timeout is to poll
for the standby trigger file to appear or SIGTERM.

BTW, I noticed that I missed incrementing the latch count in
win32_latch.c, and the owning/disowning the latch was done correctly,
you get an error if you restart the master and reconnect. I'll post an
updated patch shortly.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Thom Brown <thom(at)linux(dot)com>
Subject: Re: Reducing walreceiver latency with a latch
Date: 2010-09-13 12:13:10
Message-ID: 4C8E1556.1040900@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13/09/10 14:54, Heikki Linnakangas wrote:
> BTW, I noticed that I missed incrementing the latch count in
> win32_latch.c, and the owning/disowning the latch was done correctly,
> you get an error if you restart the master and reconnect. I'll post an
> updated patch shortly.

Here's an updated patch with those bugs fixed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
walreceiver-latch-2.patch text/x-diff 6.9 KB

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: Reducing walreceiver latency with a latch
Date: 2010-09-14 02:02:24
Message-ID: AANLkTim3aHuppiNvouv-z4TqU-h4qmwzK=sbTYF9ewg2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 13, 2010 at 9:13 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Here's an updated patch with those bugs fixed.

Great!

+ /*
+ * Walreceiver sets this latch every time new WAL has been received and
+ * fsync'd to disk, allowing startup process to wait for new WAL to
+ * arrive.
+ */
+ Latch receivedLatch;

I think that this latch should be available for other than walreceiver -
startup process communication. For example, backend - startup process
communication, which can be used for requesting a failover via SQL function
by users in the future. What about putting the latch in XLogCtl instead of
WalRcv and calling OwnLatch at the beginning of the startup process instead
of RequestXLogStreaming?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: Reducing walreceiver latency with a latch
Date: 2010-09-14 08:51:01
Message-ID: 4C8F3775.5090308@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14/09/10 05:02, Fujii Masao wrote:
> + /*
> + * Walreceiver sets this latch every time new WAL has been received and
> + * fsync'd to disk, allowing startup process to wait for new WAL to
> + * arrive.
> + */
> + Latch receivedLatch;
>
> I think that this latch should be available for other than walreceiver -
> startup process communication. For example, backend - startup process
> communication, which can be used for requesting a failover via SQL function
> by users in the future. What about putting the latch in XLogCtl instead of
> WalRcv and calling OwnLatch at the beginning of the startup process instead
> of RequestXLogStreaming?

Yes, good point. I updated the patch along those lines, attached.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
walreceiver-latch-3.patch text/x-diff 5.0 KB

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: Reducing walreceiver latency with a latch
Date: 2010-09-14 13:46:07
Message-ID: AANLkTimzyiiMckOHLJKw6y68tfpY89Rd585rVJS7z1iP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 14, 2010 at 5:51 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 14/09/10 05:02, Fujii Masao wrote:
>>
>> +       /*
>> +        * Walreceiver sets this latch every time new WAL has been
>> received and
>> +        * fsync'd to disk, allowing startup process to wait for new WAL
>> to
>> +        * arrive.
>> +        */
>> +       Latch           receivedLatch;
>>
>> I think that this latch should be available for other than walreceiver -
>> startup process communication. For example, backend - startup process
>> communication, which can be used for requesting a failover via SQL
>> function
>> by users in the future. What about putting the latch in XLogCtl instead of
>> WalRcv and calling OwnLatch at the beginning of the startup process
>> instead
>> of RequestXLogStreaming?
>
> Yes, good point. I updated the patch along those lines, attached.

Looks good.

+ /*
+ * Take ownership of the wakup latch if we're going to sleep during
+ * recovery.
+ */
+ if (StandbyMode)
+ OwnLatch(&XLogCtl->recoveryWakeupLatch);

Since automatic restart after backend crash always performs a normal crash
recovery, the startup process will never call OwnLatch more than once. So
there might be no harm even if the startup process doesn't disown the shared
latch. But... what about calling DisownLatch at the end of recovery just in
case?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center