Re: Comments to Synchronous replication patch v3

Lists: pgsql-hackers
From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Synchronous replication patch v2
Date: 2008-11-14 10:15:03
Message-ID: 3f0b79eb0811140215nd605e4u5ef56aee2ca6afea@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Attached is the latest version of Synch Rep patch. Sorry for my late posting.
I fixed some bugs in v1patch and integrated walreceiver into core. Attached
contain some patches:

* synch_rep_v2.patch
This is the whole patch of Synch Rep against head. This also contain
"Infrastructure changes for recover" patch by Simon because I'd like to
make walreceiver work in consistent recovery mode.

To be reviewed easily, I split synch_rep_v2.patch into 8 pieces.

* 01_signal_handling_v2.patch
This is the patch of signal handling changes for Synch Rep. I already
posted this.
http://archives.postgresql.org/message-id/3f0b79eb0811040404r799d3170v5bb9f201000f1771@mail.gmail.com

* 02_pqcomm_v1.patch
This is the patch of non-blocking pqcomm. Since walsender sends
xlog records and receive the reply from walreceiver concurrently, this
feature is needed.

* 03_libpq_v1.patch
This is the patch of new libpq API for Synch Rep. This is mainly used by
walreceiver. I didn't want to introduce a new communication layer,
I modified libpq for Synch Rep.

* 04_recovery_conf_v1.patch
This is the patch for postmaster to read recovery.conf like GUC.
Thereby, processes other than startup process can also get
parameters in recovery.conf. Walreceiver gets a host / port of a primary
server from recovery.conf using this feature.

* 05_split_xlogwrt_v1.patch
This is the patch of splitting XLogWrite into 3 pieces. Two of them are
used for walreceiver to write xlog records on the standby server.

* 06_walsender_v2.patch
This is the patch for sending xlog records. This contains
- walsender itself
- communication between walsender and backends
(other than signal handling changes)
- management of xlog positions

* 07_walreceiver_v1.patch
This is the patch of walreceiver which receives and writes xlog records.
Walreceiver works in consistent recovery mode.

* 08_arch_in_recovery_v1.patch
This is the patch for archiver to work in consistent recovery mode. In
the standby server, walreceiver writes xlog records to pg_xlog, and
pg_standby reads them from archive location. So, someone has to
archive them. I make walreceiver start archiver for archiving them.

If you want to apply 8 patches one by one, please apply "Infrastructure
changes for recover" patch and create the following empty files first.
- src/backend/postmaster/walsender.c
- src/backend/postmaster/walreceiver.c
- src/include/postmaster/walsender.h
- src/include/postmaster/walreceiver.h

Since source code comments and documents are insufficient,
I think I will mainly enrich them next week. And, if there is a patch
which is hard to be reviewed, I will split it further.

Any comments welcome!

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
synch_rep_patchset_v2.tgz application/x-gzip 86.2 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-14 17:00:02
Message-ID: 491DAE92.2030201@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> Attached is the latest version of Synch Rep patch.

Why do we need a separate XLogsndRqst variable in shared memory? Don't
we always want to send the WAL up to the same point as we flush it?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-14 17:07:34
Message-ID: 1226682454.27904.631.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Fri, 2008-11-14 at 19:00 +0200, Heikki Linnakangas wrote:
> Fujii Masao wrote:
> > Attached is the latest version of Synch Rep patch.
>
> Why do we need a separate XLogsndRqst variable in shared memory? Don't
> we always want to send the WAL up to the same point as we flush it?

If we're doing synch rep and we're committing. What happens when we're
doing async rep or running something like a large load. I wouldn't want
to presume that the network packet size and the disk write size are
always identical.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-14 17:23:51
Message-ID: 491DB427.6020507@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Fri, 2008-11-14 at 19:00 +0200, Heikki Linnakangas wrote:
>> Why do we need a separate XLogsndRqst variable in shared memory? Don't
>> we always want to send the WAL up to the same point as we flush it?
>
> If we're doing synch rep and we're committing.

You flush and send the WAL, up to the same point?

> What happens when we're
> doing async rep or running something like a large load.

You don't flush, and you don't request the WAL to be sent? The
background writer and WAL sender can still wake up periodically, and
write and send the WAL as they find convenient.

> I wouldn't want
> to presume that the network packet size and the disk write size are
> always identical.

Huh? No-one's presuming that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-14 18:29:46
Message-ID: 1226687386.27904.645.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Fri, 2008-11-14 at 19:23 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Fri, 2008-11-14 at 19:00 +0200, Heikki Linnakangas wrote:
> >> Why do we need a separate XLogsndRqst variable in shared memory? Don't
> >> we always want to send the WAL up to the same point as we flush it?
> >
> > If we're doing synch rep and we're committing.
>
> You flush and send the WAL, up to the same point?

Yes, but you may make progress towards it in different size steps.

> > What happens when we're
> > doing async rep or running something like a large load.
>
> You don't flush, and you don't request the WAL to be sent? The
> background writer and WAL sender can still wake up periodically, and
> write and send the WAL as they find convenient.

With WAL writes we write and flush at the same time. With WAL sending
that doesn't sound such a good plan.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-15 01:29:39
Message-ID: 603c8f070811141729vf17f6fbja1ff9dbb88a019fe@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> This is the whole patch of Synch Rep against head. This also contain
> "Infrastructure changes for recover" patch by Simon because I'd like to
> make walreceiver work in consistent recovery mode.

Given that both this patch and Simon's hot standby patches need this
infrastructure, it seems like it would be awfully good if we could get
that piece into final form and committed. Tom Lane and Heikki both
reviewed this for the September commitfest and even into October, but
then, at least as I understand it, the reviewing train kind of ran out
of steam. I'm not sure what the best way forward is. Do we need to
assign a new reviewer, or do either of Tom and Heikki feel like
picking it up again?

...Robert


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-16 10:02:26
Message-ID: 3f0b79eb0811160202o6753ac55j597024992b380dd5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Nov 15, 2008 at 2:00 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>>
>> Attached is the latest version of Synch Rep patch.
>
> Why do we need a separate XLogsndRqst variable in shared memory? Don't we
> always want to send the WAL up to the same point as we flush it?

It's because the backends flush and send the WAL concurrently. I mean that
the backend might request for the WAL to be sent up to the further point while
another backend is writing it. If that variable is not separated, the backend
cannot request until the former flushing finishes.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-17 13:44:19
Message-ID: 49217533.2070204@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
>> This is the whole patch of Synch Rep against head. This also contain
>> "Infrastructure changes for recover" patch by Simon because I'd like to
>> make walreceiver work in consistent recovery mode.
>
> Given that both this patch and Simon's hot standby patches need this
> infrastructure, it seems like it would be awfully good if we could get
> that piece into final form and committed. Tom Lane and Heikki both
> reviewed this for the September commitfest and even into October, but
> then, at least as I understand it, the reviewing train kind of ran out
> of steam. I'm not sure what the best way forward is. Do we need to
> assign a new reviewer, or do either of Tom and Heikki feel like
> picking it up again?

I'll take a look at it, but the code is tricky enough that it would be a
good idea if others, including Tom, review it too.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-17 15:29:19
Message-ID: 1226935759.3790.7.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2008-11-17 at 15:44 +0200, Heikki Linnakangas wrote:
> Robert Haas wrote:
> >> This is the whole patch of Synch Rep against head. This also contain
> >> "Infrastructure changes for recover" patch by Simon because I'd like to
> >> make walreceiver work in consistent recovery mode.
> >
> > Given that both this patch and Simon's hot standby patches need this
> > infrastructure, it seems like it would be awfully good if we could get
> > that piece into final form and committed. Tom Lane and Heikki both
> > reviewed this for the September commitfest and even into October, but
> > then, at least as I understand it, the reviewing train kind of ran out
> > of steam. I'm not sure what the best way forward is. Do we need to
> > assign a new reviewer, or do either of Tom and Heikki feel like
> > picking it up again?
>
> I'll take a look at it, but the code is tricky enough that it would be a
> good idea if others, including Tom, review it too.

Yes please to multiple reviewers.

The Hot Standby patch can be considered v9 of the infrastructure patch,
as mentioned on the wiki.

I'll look to separate them so we review the correct code.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-17 16:56:02
Message-ID: 4921A222.40307@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> The Hot Standby patch can be considered v9 of the infrastructure patch,
> as mentioned on the wiki.
>
> I'll look to separate them so we review the correct code.

Ok, I started reviewing the other v9
(http://archives.postgresql.org/message-id/1223472898.4747.310.camel@ebony.2ndQuadrant).
I'll look at other patches meanwhile you separate them again.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Synchronous replication patch v2
Date: 2008-11-17 17:10:32
Message-ID: 1226941832.3790.28.camel@ebony.2ndQuadrant
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2008-11-17 at 18:56 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > The Hot Standby patch can be considered v9 of the infrastructure patch,
> > as mentioned on the wiki.
> >
> > I'll look to separate them so we review the correct code.
>
> Ok, I started reviewing the other v9
> (http://archives.postgresql.org/message-id/1223472898.4747.310.camel@ebony.2ndQuadrant).
> I'll look at other patches meanwhile you separate them again.

Arghhh! Sorry, slip of finger earlier. Latest Hot Standby is v10.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pavan(dot)deolasee(at)gmail(dot)com, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronous replication patch v2
Date: 2008-11-18 03:08:26
Message-ID: 3f0b79eb0811171908q2434a0bfrda0964987bb1a83a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Fri, Nov 14, 2008 at 7:15 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> Attached is the latest version of Synch Rep patch. Sorry for my late posting.
> I fixed some bugs in v1patch and integrated walreceiver into core. Attached
> contain some patches:

Attached is v3 of Synch Rep patch. I changed the signal handling, as
pointed out in another thread.
http://archives.postgresql.org/pgsql-hackers/2008-11/msg01041.php

Other changes are;

* In v2, whenever XLogSend sent the WAL, it was initializing StringInfo,
but since it's useless, I changed it so that one StringInfo might be
used.

* In v2, since the WAL files were not recycled in the standby, fsynch by
walreceiver when initializing the WAL files (XLogFileInit) was one of
the bottleneck. So, I made the bgwriter recycle the old WAL files when
creating the restart point, and the startup process recycle the restored
file (RECOVERYXLOG) when trying to remove it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
synch_rep_patchset_v3.tgz application/x-gzip 95.6 KB

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Comments to Synchronous replication patch v3
Date: 2008-11-25 02:42:00
Message-ID: 20081125094837.7B2A.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Nov 14, 2008 at 7:15 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > Attached is the latest version of Synch Rep patch. Sorry for my late posting.
> > I fixed some bugs in v1patch and integrated walreceiver into core.

I have some comments about v3 patch.

[1] Should walsender database be real or not?
[2] User-configurable replication_timeout is dangerous
[3] How to configure wal_buffers and wal_sender_delay?
[4] XLogArchivingActive on replication mode
[5] Usage of XLog Send-Flush-Wait
[6] Bit-OR or Logical-OR

----
[1] Should walsender database be real or not?
Index: bin/initdb/initdb.c
+ "CREATE DATABASE walsender;\n",

You create walsender as a *real* database, but is it really needed?
Can we make it a virtual database only for walreceiver?

And backend process crashes when I login the walsender database:
$ psql walsender
psql: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

Even if we need to have the database in real, I'd like to use another
name for it. The name 'walsender' seems to be an internal module name
but it should be a feature name (ex. 'replication').

----
[2] User-configurable replication_timeout is dangerous
Index: backend/utils/misc/guc.c
+ {"replication_timeout", PGC_USERSET, WAL_REPLICATION,

You export replication_timeout as a PGC_USERSET variable, but it is
dangerous. It allows non-superusers to kill servers easily by setting it
too low value. Walsender dies with FATAL on timeout.

BTW, how about adding an option to choose FATAL or ERROR on timeout?
I think FATAL is too strong for some cases.

----
[3] How to configure wal_buffers and wal_sender_delay?
The parameter wal_buffers might be more important in synch rep,
but we don't have a mean to know whether wal_buffers is enough or not.
Should we have a new system view 'pg_stat_xlog' to find shortage
of wal_buffers and wal_sender_delay?

----
[4] XLogArchivingActive on replication mode
XLogArchivingActive is not modified in the patch, but is it safe?
For example, we might need to disable COPY-without-wal optimization
when replication is enabled.

----
[5] Usage of XLog Send-Flush-Wait
There are many following sequences:
RequestXLogSend(WriteRqstPtr, true);
XLogFlush(WriteRqstPtr);
WaitXLogSend(WriteRqstPtr, false, true);

It introduces additional complexities to use XLogFlush.
Is it possible to push RequestXLogSend and WaitXLogSend into XLogFlush?
It might require a new flag argument in XLogFlush.

----
[6] Bit-OR or Logical-OR
WaitXLogSend(XactLastRecEnd, false, forceSyncCommit | haveNonTemp);
should be
WaitXLogSend(XactLastRecEnd, false, forceSyncCommit || haveNonTemp);
| is bit OR and || is logical OR.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-25 09:03:03
Message-ID: 3f0b79eb0811250103v390f141dsb8db1458940bbf36@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 25, 2008 at 11:42 AM, ITAGAKI Takahiro
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Fri, Nov 14, 2008 at 7:15 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > Attached is the latest version of Synch Rep patch. Sorry for my late posting.
>> > I fixed some bugs in v1patch and integrated walreceiver into core.
>
> I have some comments about v3 patch.

Hi, Itagaki-san. Thanks for taking time to review the patch.

>
> [1] Should walsender database be real or not?
> [2] User-configurable replication_timeout is dangerous
> [3] How to configure wal_buffers and wal_sender_delay?
> [4] XLogArchivingActive on replication mode
> [5] Usage of XLog Send-Flush-Wait
> [6] Bit-OR or Logical-OR
>
> ----
> [1] Should walsender database be real or not?
> Index: bin/initdb/initdb.c
> + "CREATE DATABASE walsender;\n",
>
> You create walsender as a *real* database, but is it really needed?
> Can we make it a virtual database only for walreceiver?

I think that the real database is better, because it's simpler and the
user can re-create it easily even if it is lost accidentally. Of course,
if we introduce the new API to handle a virtual database, we can
virtualize it. Is it worth introducing it?

>
> And backend process crashes when I login the walsender database:
> $ psql walsender
> psql: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.

Obviously undesirable behavior :(
I will fix it.

>
> Even if we need to have the database in real, I'd like to use another
> name for it. The name 'walsender' seems to be an internal module name
> but it should be a feature name (ex. 'replication').

Agreed. The name 'replication' is more suitable, I also think.
Any other ideas?

>
> ----
> [2] User-configurable replication_timeout is dangerous
> Index: backend/utils/misc/guc.c
> + {"replication_timeout", PGC_USERSET, WAL_REPLICATION,
>
> You export replication_timeout as a PGC_USERSET variable, but it is
> dangerous. It allows non-superusers to kill servers easily by setting it
> too low value. Walsender dies with FATAL on timeout.
>
> BTW, how about adding an option to choose FATAL or ERROR on timeout?
> I think FATAL is too strong for some cases.

OK, I will add the new GUC option for specifying the behavior when the
timeout occurs. I think the valid values are ERROR, FATAL and PANIC.

* ERROR only cancels that the backend waits for replication. The backends
(including the canceled one) and walsender continue processing. But,
synchronous replication might be broken.

* FATAL terminates walsender. The backends continue processing without
replication.

* PANIC terminates all processes. In previous discussion, someone wanted
this behavior.

Or, should I define replication_timeout as PGC_SUSET?

>
> ----
> [3] How to configure wal_buffers and wal_sender_delay?
> The parameter wal_buffers might be more important in synch rep,
> but we don't have a mean to know whether wal_buffers is enough or not.
> Should we have a new system view 'pg_stat_xlog' to find shortage
> of wal_buffers and wal_sender_delay?

Couting the number of times which the buffer page is replaced in
AdvanceXLInsertBuffer might be help to configure them. Of course,
this kind of feature is very useful. But, is it essential for Synch Rep?
If not, I'd like to postpone it to 8.5.

>
> ----
> [4] XLogArchivingActive on replication mode
> XLogArchivingActive is not modified in the patch, but is it safe?
> For example, we might need to disable COPY-without-wal optimization
> when replication is enabled.

You are right! I will fix it.

>
> ----
> [5] Usage of XLog Send-Flush-Wait
> There are many following sequences:
> RequestXLogSend(WriteRqstPtr, true);
> XLogFlush(WriteRqstPtr);
> WaitXLogSend(WriteRqstPtr, false, true);
>
> It introduces additional complexities to use XLogFlush.
> Is it possible to push RequestXLogSend and WaitXLogSend into XLogFlush?
> It might require a new flag argument in XLogFlush.

OK, I will push them into XLogFlush.

>
> ----
> [6] Bit-OR or Logical-OR
> WaitXLogSend(XactLastRecEnd, false, forceSyncCommit | haveNonTemp);
> should be
> WaitXLogSend(XactLastRecEnd, false, forceSyncCommit || haveNonTemp);
> | is bit OR and || is logical OR.

Oops! I will fix it.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-25 11:38:29
Message-ID: 492BE3B5.3020800@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao escreveu:
> (...)
>> Even if we need to have the database in real, I'd like to use another
>> name for it. The name 'walsender' seems to be an internal module name
>> but it should be a feature name (ex. 'replication').
>>
>
> Agreed. The name 'replication' is more suitable, I also think.
> Any other ideas?
>

'walsender' should be a schema in the 'replication' database. Other
modules, in replication feature, could be placed there too.

[]s

--
Dickson S. Guedes
Administrador de Banco de Dados
Confesol - Projeto Colmeia
Florianopolis, SC, Brasil
(48) 3322-1185, ramal: 26


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-25 13:57:50
Message-ID: 20081125135750.GC4875@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dickson S. Guedes escribió:
> Fujii Masao escreveu:
>> (...)
>>> Even if we need to have the database in real, I'd like to use another
>>> name for it. The name 'walsender' seems to be an internal module name
>>> but it should be a feature name (ex. 'replication').
>>>
>>
>> Agreed. The name 'replication' is more suitable, I also think.
>> Any other ideas?
>
> 'walsender' should be a schema in the 'replication' database. Other
> modules, in replication feature, could be placed there too.

Hmm, what is this database there for?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-25 15:17:57
Message-ID: 3f0b79eb0811250717o502e9754ra4b9009d42d78525@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 25, 2008 at 10:57 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Dickson S. Guedes escribió:
>> Fujii Masao escreveu:
>>> (...)
>>>> Even if we need to have the database in real, I'd like to use another
>>>> name for it. The name 'walsender' seems to be an internal module name
>>>> but it should be a feature name (ex. 'replication').
>>>>
>>>
>>> Agreed. The name 'replication' is more suitable, I also think.
>>> Any other ideas?
>>
>> 'walsender' should be a schema in the 'replication' database. Other
>> modules, in replication feature, could be placed there too.
>
> Hmm, what is this database there for?

It's for authentication for replication. This was discussed before.
Please see the following thread and feel free to comment.
http://archives.postgresql.org/pgsql-hackers/2008-11/msg00187.php

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-25 15:23:45
Message-ID: 20081125152345.GE4875@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao escribió:
> On Tue, Nov 25, 2008 at 10:57 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > Dickson S. Guedes escribió:
> >> Fujii Masao escreveu:
> >>> (...)
> >>>> Even if we need to have the database in real, I'd like to use another
> >>>> name for it. The name 'walsender' seems to be an internal module name
> >>>> but it should be a feature name (ex. 'replication').
> >>>>
> >>>
> >>> Agreed. The name 'replication' is more suitable, I also think.
> >>> Any other ideas?
> >>
> >> 'walsender' should be a schema in the 'replication' database. Other
> >> modules, in replication feature, could be placed there too.
> >
> > Hmm, what is this database there for?
>
> It's for authentication for replication. This was discussed before.
> Please see the following thread and feel free to comment.
> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00187.php

Hmm ... I think this means that the suggestion by Dickson does not make
much sense, right?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-25 16:40:41
Message-ID: 3f0b79eb0811250840i544e6be0t86db5ce1b8da0b8e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2008 at 12:23 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Fujii Masao escribió:
>> On Tue, Nov 25, 2008 at 10:57 PM, Alvaro Herrera
>> <alvherre(at)commandprompt(dot)com> wrote:
>> > Dickson S. Guedes escribió:
>> >> Fujii Masao escreveu:
>> >>> (...)
>> >>>> Even if we need to have the database in real, I'd like to use another
>> >>>> name for it. The name 'walsender' seems to be an internal module name
>> >>>> but it should be a feature name (ex. 'replication').
>> >>>>
>> >>>
>> >>> Agreed. The name 'replication' is more suitable, I also think.
>> >>> Any other ideas?
>> >>
>> >> 'walsender' should be a schema in the 'replication' database. Other
>> >> modules, in replication feature, could be placed there too.
>> >
>> > Hmm, what is this database there for?
>>
>> It's for authentication for replication. This was discussed before.
>> Please see the following thread and feel free to comment.
>> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00187.php
>
> Hmm ... I think this means that the suggestion by Dickson does not make
> much sense, right?

Oh, I'm sorry!

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-25 18:12:50
Message-ID: 492C4022.5090806@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Fujii Masao escribió:
>> On Tue, Nov 25, 2008 at 10:57 PM, Alvaro Herrera
>> <alvherre(at)commandprompt(dot)com> wrote:
>>> Dickson S. Guedes escribió:
>>>> Fujii Masao escreveu:
>>>>> (...)
>>>>>> Even if we need to have the database in real, I'd like to use another
>>>>>> name for it. The name 'walsender' seems to be an internal module name
>>>>>> but it should be a feature name (ex. 'replication').
>>>>>>
>>>>> Agreed. The name 'replication' is more suitable, I also think.
>>>>> Any other ideas?
>>>> 'walsender' should be a schema in the 'replication' database. Other
>>>> modules, in replication feature, could be placed there too.
>>> Hmm, what is this database there for?
>> It's for authentication for replication. This was discussed before.
>> Please see the following thread and feel free to comment.
>> http://archives.postgresql.org/pgsql-hackers/2008-11/msg00187.php
>
> Hmm ... I think this means that the suggestion by Dickson does not make
> much sense, right?

Right.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: David Fetter <david(at)fetter(dot)org>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-26 19:00:47
Message-ID: 20081126190047.GO19464@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 25, 2008 at 12:23:45PM -0300, Alvaro Herrera wrote:
> Fujii Masao escribió:
> > On Tue, Nov 25, 2008 at 10:57 PM, Alvaro Herrera
> > <alvherre(at)commandprompt(dot)com> wrote:
> > > Dickson S. Guedes escribió:
> > >> Fujii Masao escreveu:
> > >>> (...)
> > >>>> Even if we need to have the database in real, I'd like to use
> > >>>> another name for it. The name 'walsender' seems to be an
> > >>>> internal module name but it should be a feature name (ex.
> > >>>> 'replication').
> > >>>>
> > >>>
> > >>> Agreed. The name 'replication' is more suitable, I also think.
> > >>> Any other ideas?
> > >>
> > >> 'walsender' should be a schema in the 'replication' database.
> > >> Other modules, in replication feature, could be placed there
> > >> too.
> > >
> > > Hmm, what is this database there for?
> >
> > It's for authentication for replication. This was discussed
> > before. Please see the following thread and feel free to comment.
> > http://archives.postgresql.org/pgsql-hackers/2008-11/msg00187.php
>
> Hmm ... I think this means that the suggestion by Dickson does not
> make much sense, right?

It sounds to me like this should use SQL/MED connections, if it's
holding auth information :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-26 19:15:49
Message-ID: 492DA065.6010500@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Fetter wrote:
> It sounds to me like this should use SQL/MED connections, if it's
> holding auth information :)

No, the SQL/MED stuff holds authentication information to authenticate
to other data sources. This is about authentication of *incoming*
connections.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: David Fetter <david(at)fetter(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, "Dickson S(dot) Guedes" <guediz(at)gmail(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-26 19:57:20
Message-ID: 20081126195720.GA1014@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 26, 2008 at 09:15:49PM +0200, Heikki Linnakangas wrote:
> David Fetter wrote:
>> It sounds to me like this should use SQL/MED connections, if it's
>> holding auth information :)
>
> No, the SQL/MED stuff holds authentication information to authenticate
> to other data sources. This is about authentication of *incoming*
> connections.

Thanks for clearing that up :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-28 03:27:02
Message-ID: 3f0b79eb0811271927q595f1587qde5a6e4a641dd731@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

On Tue, Nov 25, 2008 at 6:03 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> [2] User-configurable replication_timeout is dangerous
>> Index: backend/utils/misc/guc.c
>> + {"replication_timeout", PGC_USERSET, WAL_REPLICATION,
>>
>> You export replication_timeout as a PGC_USERSET variable, but it is
>> dangerous. It allows non-superusers to kill servers easily by setting it
>> too low value. Walsender dies with FATAL on timeout.

Unlike other background processes, FATAL by walsender doesn't kill the
whole server. In FATAL case, walsender is treated like the normal backend,
and only walsender dies. Please see reaper() in postmaster.c.

Just to be safe, I re-export the parameter as PGC_SUSET in the latest
patch. Is still this parameter dangerous?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-11-28 16:10:27
Message-ID: 26328.1227888627@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Fujii Masao" <masao(dot)fujii(at)gmail(dot)com> writes:
>>> You export replication_timeout as a PGC_USERSET variable, but it is
>>> dangerous. It allows non-superusers to kill servers easily by setting it
>>> too low value. Walsender dies with FATAL on timeout.

> Unlike other background processes, FATAL by walsender doesn't kill the
> whole server. In FATAL case, walsender is treated like the normal backend,
> and only walsender dies. Please see reaper() in postmaster.c.

> Just to be safe, I re-export the parameter as PGC_SUSET in the latest
> patch. Is still this parameter dangerous?

If this parameter is only used by a background process, then both of
those are wrong. It should be marked SIGHUP to reflect the fact that
the only effective way of modifying it is via postgresql.conf.

regards, tom lane


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Comments to Synchronous replication patch v3
Date: 2008-12-01 02:17:16
Message-ID: 3f0b79eb0811301817u7fafdffbx387fa5b27b838f92@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, thanks for the comment!

On Sat, Nov 29, 2008 at 1:10 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com> writes:
>>>> You export replication_timeout as a PGC_USERSET variable, but it is
>>>> dangerous. It allows non-superusers to kill servers easily by setting it
>>>> too low value. Walsender dies with FATAL on timeout.
>
>> Unlike other background processes, FATAL by walsender doesn't kill the
>> whole server. In FATAL case, walsender is treated like the normal backend,
>> and only walsender dies. Please see reaper() in postmaster.c.
>
>> Just to be safe, I re-export the parameter as PGC_SUSET in the latest
>> patch. Is still this parameter dangerous?
>
> If this parameter is only used by a background process, then both of
> those are wrong. It should be marked SIGHUP to reflect the fact that
> the only effective way of modifying it is via postgresql.conf.

This parameter is used by the backend to wait for replication to complete
for the specified time at most. If the timeout has come, the backend send
the timtout-interrupt to walsender. Then walsender invokes FATAL error,
which doesn't kill the whole server. So, I choosed PGC_USERSET at first.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center