Re: SynchRep; wait-forever and shutdown

Lists: pgsql-hackers
From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: SynchRep; wait-forever and shutdown
Date: 2010-12-10 04:54:55
Message-ID: AANLkTin3kgcMxTjGqpD25sg2T8gPs6UAqeBq+HQJjB7G@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

In previous discussion, some people wanted the "wait-forever" option which
blocks all the transactions on the master until sync'd standby has appeared,
in order to reduce the risk of data loss in synchronous replication.

What I'm not clear is; How does smart or fast shudown advance while all the
transactions are being blocked?

1. Shutdown should wait for all the transactions to end by appearance of
sync'd standby?
* Problem is that shutdown would take very long.

2. Shutdown should commit all the blocking transactions?
* Problem is that a client thinks that those transactions have successfully
been committed even though they have not been replicated to the
standby.

3. Shutdown should abort all the blocking transactions?
* Problem is that a client thinks that those transactions have been aborted
even though those WAL records have been written on the master. But
this is very common problem for DBMS, so we don't need to worry about
this in the context of replication.

ISTM smart and fast shutdown fits in with #1 and #3, respectively. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SynchRep; wait-forever and shutdown
Date: 2010-12-10 17:54:46
Message-ID: 4D026966.7020203@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> 3. Shutdown should abort all the blocking transactions?
> * Problem is that a client thinks that those transactions have been aborted
> even though those WAL records have been written on the master. But
> this is very common problem for DBMS, so we don't need to worry about
> this in the context of replication.

Hmmm. The WAL records are written as commited ... this is why people
get into 2PC if they want full synchrnous. Short of using 2PC, there is
simply no way we can guarentee that the master and the standby won't get
out of sync. And even 2PC isn't perfect.

I think the best we can do is have the master abort the sessions and
shutdown for a -fast. Yes, the clients are confused about what's been
committed, but frequently that's the case with a -fast anyway.

However, we need to give the user more information. I'd say that we
need to have a specific error message associated with a synchronization
failure around shutdown time. This error should be both returned to the
clients, and logged. That way the DBA can decide what to do about the
error, if anything.

So, I'd say this is the way to go:
Shutdown Smart:
Wait for all pending standby transaction to clear.
After 60 seconds, emit an error message on the shutdown console:
NOTICE: pending replication transactions still waiting
... that way the DBA knows to move on to -fast

Shutdown Fast:
Wait for 1 second for all pending standby transactions to clear.
If they don't clear, emit an error to both the shutdown console
and the client consoles:
WARNING: some transactions not replicated
Send a commit message on the client consoles
Shutdown.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SynchRep; wait-forever and shutdown
Date: 2010-12-14 00:41:05
Message-ID: AANLkTi=c_TFJSCrZcW0W2MaEYuxVPkKHxn_jLN8qy+pS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 9, 2010 at 11:54 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> In previous discussion, some people wanted the "wait-forever" option which
> blocks all the transactions on the master until sync'd standby has appeared,
> in order to reduce the risk of data loss in synchronous replication.
>
> What I'm not clear is; How does smart or fast shudown advance while all the
> transactions are being blocked?
>
> 1. Shutdown should wait for all the transactions to end by appearance of
>     sync'd standby?
>     * Problem is that shutdown would take very long.
>
> 2. Shutdown should commit all the blocking transactions?
>     * Problem is that a client thinks that those transactions have successfully
>        been committed even though they have not been replicated to the
>        standby.
>
> 3. Shutdown should abort all the blocking transactions?
>     * Problem is that a client thinks that those transactions have been aborted
>        even though those WAL records have been written on the master. But
>        this is very common problem for DBMS, so we don't need to worry about
>        this in the context of replication.
>
> ISTM smart and fast shutdown fits in with #1 and #3, respectively. Thought?

I might be missing something, but I don't see why this case requires
any special handling. As far as I can see, #2 and #3 are nonsense:
the client isn't waiting on the commit per se, but rather the
acknowledgment of the commit. In a smart shutdown, we wait for all
clients to disconnect. If they never disconnect, we never shut down.
It's a lame behavior and we might want to change it some day - at
least by adding a timeout - but I don't see any reason to change it
because of synchronous replication per se. In a fast shutdown, we
boot all clients off immediately. If they were waiting for an
acknowledgment, they don't get it. The application has to handle this
case, just as it does today if it sends a COMMIT command and the
connection is disconnected before it receives a response.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company