Re: Streaming Replication patch for CommitFest 2009-09

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication patch for CommitFest 2009-09
Date: 2009-09-17 11:32:59
Message-ID: 4AB21E6B.1020602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Some random comments:

I don't think we need the new PM_SHUTDOWN_3 postmaster state. We can
treat walsenders the same as the archive process, and kill and wait for
both of them to die in PM_SHUTDOWN_2 state.

I think there's something wrong with the napping in walsender. When I
perform px_xlog_switch(), it takes surprisingly long for it to trickle
to the standby. When I put a little proxy program in between the master
and slave that delays all messages from the slave to the master by one
second, it got worse, even though I would expect the master to still
keep sending WAL at full speed. I get logs like this:

2009-09-17 14:13:16.876 EEST LOG: xlog send request 0/38000000; send
0/3700006C; write 0/3700006C
2009-09-17 14:13:16.877 EEST LOG: xlog read request 0/37010000; send
0/37010000; write 0/3700006C
2009-09-17 14:13:17.077 EEST LOG: xlog send request 0/38000000; send
0/37010000; write 0/3700006C
2009-09-17 14:13:17.077 EEST LOG: xlog read request 0/37020000; send
0/37020000; write 0/3700006C
2009-09-17 14:13:17.078 EEST LOG: xlog read request 0/37030000; send
0/37030000; write 0/3700006C
2009-09-17 14:13:17.278 EEST LOG: xlog send request 0/38000000; send
0/37030000; write 0/3700006C
2009-09-17 14:13:17.279 EEST LOG: xlog read request 0/37040000; send
0/37040000; write 0/3700006C
...
2009-09-17 14:13:22.796 EEST LOG: xlog read request 0/37FD0000; send
0/37FD0000; write 0/376D0000
2009-09-17 14:13:22.896 EEST LOG: xlog send request 0/38000000; send
0/37FD0000; write 0/376D0000
2009-09-17 14:13:22.896 EEST LOG: xlog read request 0/37FE0000; send
0/37FE0000; write 0/376D0000
2009-09-17 14:13:22.896 EEST LOG: xlog read request 0/37FF0000; send
0/37FF0000; write 0/376D0000
2009-09-17 14:13:22.897 EEST LOG: xlog read request 0/38000000; send
0/38000000; write 0/376D0000
2009-09-17 14:14:09.932 EEST LOG: xlog send request 0/38000428; send
0/38000000; write 0/38000000
2009-09-17 14:14:09.932 EEST LOG: xlog read request 0/38000428; send
0/38000428; write 0/38000000

It looks like it's having 100 or 200 ms naps in between. Also, I
wouldn't expect to see so many "read request" acknowledgments from the
slave. The master doesn't really need to know how far the slave is,
except in synchronous replication when it has requested a flush to
slave. Another reason why master needs to know is so that the master can
recycle old log files, but for that we'd really only need an
acknowledgment once per WAL file or even less.

Why does XLogSend() care about page boundaries? Perhaps it's a leftover
from the old approach that read from wal_buffers?

Do we really need the support for asynchronous backend libpq commands?
Could walsender just keep blasting WAL to the slave, and only try to
read an acknowledgment after it has requested one, by setting
XLOGSTREAM_FLUSH flag. Or maybe we should be putting the socket into
non-blocking mode.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Emmanuel Cecchet 2009-09-17 11:55:05 Re: generic copy options
Previous Message Andrew Dunstan 2009-09-17 11:29:50 Re: generic copy options