Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Date: 2012-02-25 20:46:54
Message-ID: CAMkU=1xoA6Fdyoj_4fMLqpicZR1V9GP7cLnXJdHU+iGgqb6WUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Feb 21, 2012 at 5:34 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Tue, Feb 21, 2012 at 8:19 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sat, Feb 18, 2012 at 12:36 AM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Attached is a new version, fixing that, and off-by-one bug you pointed out
>>> in the slot wraparound handling. I also moved code around a bit, I think
>>> this new division of labor between the XLogInsert subroutines is more
>>> readable.
>
> When I ran the long-running performance test, I encountered the following
> panic error.
>
>    PANIC:  could not find WAL buffer for 0/FF000000

I too see this panic when the system survives long enough to get to
that log switch.

But I'm also still seeing (with version 9) the assert failure at
"xlog.c", Line: 2154 during the end-of-recovery checkpoint.

Here is a set up for repeating my tests. I used this test simply
because I had it sitting around after having written it for other
purposes. Indeed I'm not all that sure I should publish it.
Hopefully other people will write other tests which exercise other
corner cases, rather than exercising the same ones I am.

The patch creates a guc which causes the md writer routine to panic
and bring down the database, triggering recovery, after a given number
for writes. In this context probably any other method of forcing a
crash and recovery would be just as good as this specific method of
crashing.

The choice of 400 for the cutoff for crashing is based on:

1) If the number is too low, you re-crash within recovery so you never
get a chance to inspect the database. In my hands, recovery doesn't
need to do more than 400 writes. (I don't know how to make the
database use different guc setting during recovery than it did before
the crash).

2) If the number is too high, it takes too long for a crash to happen
and I'm not all that patient.

Some of the changes to postgresql.conf.sample are purely my
preferences and have nothing in particular to do with this set up.
But archive_timeout = 30 is necessary in order to get checkpoints, and
thus mdwrites, to happen often enough to trigger crashes often enough
to satisfy my impatience.

The Perl script exercises the integrity of the database by launching
multiple processes (4 by default) to run updates and memorize what
updates they have run. After a crash, the Perl processes all
communicate their data up to the parent, which consolidates that
information and then queries the post-recovery database to make sure
it agrees. Transactions that are in-flight at the time of a crash are
indeterminate. Maybe the crash happened before the commit, and maybe
it happened after the commit but before we received notification of
the commit. So whichever way those turn out, it is not proof of
corruption.

With the xloginsert-scale-9.patch, the above features are not needed
because the problem is not that the database is incorrect after
recovery, but that the database doesn't recover in the first place. So
just running pgbench would be good enough to detect that. But in
earlier versions this feature did detect incorrect recovery.

This logs an awful lot of stuff, most of which merely indicates normal
operation. The problem is that corruption is rare, so if you wait
until you see corruption before turning on logging, then you have to
wait l long time to get another instance of corruption so you can
dissect the log information. So, I just log everything all of the
time.
A warning from 'line 63' which is not marked as in-flight indicates
database corruption. A warning from 'line 66' indicates even worse
corruption. A failure of the entire outer script to execute for the
expected number of iterations (i.e. failure of the warning issued on
'line 18' to show up 100 times) indicates the database failed to
restart.

Also attached is a bash script that exercises the whole thing. Note
that it has various directories hard coded that really ought not be,
and that it has no compunctions about calling rm -r /tmp/data. I run
it is as "./do.sh >& log" and then inspect the log file for unusual
lines.

To run this, you first have to apply your own xlog patch, and apply my
crash-inducing patch, and build and install the resulting pgsql. And
edit the shell script to point to it, etc.. The whole thing is a bit
of a idiosyncratic mess.

Cheers,

Jeff

Attachment Content-Type Size
crash_REL9_2CF4.patch application/octet-stream 6.7 KB
count.pl application/octet-stream 2.6 KB
do.sh application/x-sh 689 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2012-02-25 20:56:13 Re: FDW system columns
Previous Message Cédric Villemain 2012-02-25 20:13:47 Re: VACUUM ANALYZE is faster than ANALYZE?