Re: Sync Rep v17

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Yeb Havinga <yebhavinga(at)gmail(dot)com>
Cc: Jaime Casanova <jaime(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org, Daniel Farina <daniel(at)heroku(dot)com>
Subject: Re: Sync Rep v17
Date: 2011-02-28 18:39:48
Message-ID: 1298918388.12992.1714.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2011-02-28 at 10:31 +0100, Yeb Havinga wrote:

> 1) no automatic switch to other synchronous standby
> - start master server, add synchronous standby 1
> - change allow_standalone_primary to off
> - add second synchronous standby
> - wait until pg_stat_replication shows both standby's are in STREAMING state
> - stop standby 1
> what happens is that the master stalls, where I expected that it
> would've switched to standby 2 acknowledge commits.
>
> The following thing was pilot error, but since I was test-piloting a new
> plane, I still think it might be usual feedback. In my opinion, any
> number and order of pg_ctl stops and starts on both the master and
> standby servers, as long as they are not with -m immediate, should never
> cause the state I reached.

The behaviour of "allow_synchronous_standby = off" is pretty much
untested and does seem to have various gotchas in there.

> 2) reaching some sort of shutdown deadlock state
> - start master server, add synchronous standby
> - change allow_standalone_primary to off
> then I did all sorts of test things, everything still ok. Then I wanted
> to shutdown everything, and maybe because of some symmetry (stack like)
> I did the following because I didn't think it through
> - pg_ctl stop on standby (didn't actualy wait until done, but
> immediately in other terminal)
> - pg_ctl stop on master
> O wait.. master needs to sync transactions
> - start standby again. but now: FATAL: the database system is shutting down
>
> There is no clean way to get out of this situation.
> allow_standalone_primary in the face of shutdowns might be tricky. Maybe
> shutdown must be prohibited to enter the shutting down phase in
> allow_standalone_primary = off together with no sync standby, that would
> allow for the sync standby to attach again.

The behaviour of "allow_synchronous_standby = off" is not something I'm
worried about personally and I've argued all along it sounds pretty
silly to me. If someone wants to spend some time defining how it
*should* work that might help matters. I'm inclined to remove it before
commit if it can't work cleanly, to be re-added at a later date if it
makes sense.

>
> 3) PANIC on standby server
> At some point a standby suddenly disconnected after I started a new
> pgbench run on a existing master/standby pair, with the following error
> in the logfile.
>
> LOCATION: libpqrcv_connect, libpqwalreceiver.c:171
> PANIC: XX000: heap_update_redo: failed to add tuple
> CONTEXT: xlog redo hot_update: rel 1663/16411/16424; tid 305453/15; new
> 305453/102
> LOCATION: heap_xlog_update, heapam.c:4724
> LOG: 00000: startup process (PID 32597) was terminated by signal 6: Aborted
>
> This might be due to pilot error as well; I did a several tests over the
> weekend and after this error I was more alert on remembering immediate
> shutdowns/starting with a clean backup after that, and didn't see
> similar errors since.

Good. There are no changes in the patch for that section of code.

> 4) The performance of the syncrep seems to be quite an improvement over
> the previous syncrep patches, I've seen tps-ses of O(650) where the
> others were more like O(20). The O(650) tps is limited by the speed of
> the standby server I used-at several times the master would halt only
> because of heavy disk activity at the standby. A warning in the docs
> might be right: be sure to use good IO hardware for your synchronous
> replicas! With that bottleneck gone, I suspect the current syncrep
> version can go beyond 1000tps over 1 Gbit.

Good, thanks.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-02-28 18:40:24 Re: Sync Rep v17
Previous Message Simon Riggs 2011-02-28 18:39:43 Re: Sync Rep v17