Re: Hot standby, overflowed snapshots, testing

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Hodges <robert(dot)hodges(at)continuent(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hot standby, overflowed snapshots, testing
Date: 2009-11-15 10:25:04
Message-ID: 1258280704.14054.1065.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, 2009-11-14 at 08:43 -0800, Robert Hodges wrote:

> I can help set up automated basic tests for hot standby using 1+1 setups on
> Amazon. I¹m already working on tests for warm standby for our commercial
> Tungsten implementation and need to solve the problem of creating tests that
> adapt flexibly across different replication mechanisms.

I didn't leap immediately to say yes for a couple of reasons.

More than 50% of the bugs found on HS now have been theoretical-ish
issues that would very difficult to observe, let alone isolate with
black box testing. In many cases they are unlikely to happen, but that
is not our approach to quality. This shows there isn't a good substitute
for very long explanatory comments which are then read and challenged by
a reviewer, though I would note Heikki's particular skill in doing that.

The second most frequent class of bugs have been "unit test" bugs, where
the modules themselves need better unit testing. Block box testing only
works to address this when there is an exhaustive test-coverage driven
approach, but even then it's hard to inject real/appropriate conditions
into many deeply buried routines. Best way seems to be just multiple
debugger sessions and lots of time.

HS is characterised by a very low "additional feature" profile. It
leverages many existing modules to create something on the standby that
already exists on the primary. So in many ways it is a very different
sort of patch to many others.

There have been a few dumb-ass bugs and I hold my hand up to those,
though the reason is to do with timing of patch delivery and testing. I
don't see any long term issues, just unfortunate short term circumstance
because of patch churn.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2009-11-15 11:42:53 Re: Aggregate ORDER BY patch
Previous Message Magnus Hagander 2009-11-15 10:03:28 Re: Summary and Plan for Hot Standby