Re: Separating bgwriter and checkpointer

From: "Dickson S(dot) Guedes" <listas(at)guedesoft(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Separating bgwriter and checkpointer
Date: 2011-10-05 04:10:09
Message-ID: CAHHcrepQNCN3HoXSWAOtsj3HJKp5NoepB2ky3dFgVA4JCXh-Qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2011/10/4 Simon Riggs <simon(at)2ndquadrant(dot)com>:
> The problem is the *same* one I fixed in v2, yet now I see I managed
> to somehow exclude that fix from the earlier patch. Slap. Anyway,
> fixed again now.

Ah ok! I started reviewing the v4 patch version, this is my comments:

Submission review
===============

1. The patch applies cleanly to current master (fa56a0c3e) but isn't
in context diff format;

Feature test
==========

1. Since I patched and make installed it, I can see the expected
processes: writer and checkpointer;

2. I did the following tests with the following results:

2.1 Running a long time pgbench didn't emit any assertion failure or
crash and the checkpoint works as before patch:

LOG: checkpoint starting: xlog
LOG: checkpoint complete: wrote 300 buffers (9.8%); 0 transaction
log file(s) added, 0 removed, 0 recycled; write=26.103 s, sync=6.492
s, total=34.000 s; sync files=13, longest=4.684 s, average=0.499 s
LOG: checkpoint starting: time
LOG: checkpoint complete: wrote 257 buffers (8.4%); 0 transaction
log file(s) added, 0 removed, 3 recycled; write=21.819 s, sync=9.610
s, total=32.076 s; sync files=7, longest=6.452 s, average=1.372 s

2.2 Forcing a checkpoint when filesystem has enough free space works
fine while pgbench is running:

LOG: checkpoint starting: immediate force wait
LOG: checkpoint complete: wrote 1605 buffers (52.2%); 0 transaction
log file(s) added, 0 removed, 2 recycled; write=0.344 s, sync=22.750
s, total=23.700 s; sync files=10, longest=15.586 s, average=2.275 s

2.3 Forcing a checkpoint when filesystem are full, works as expected:

LOG: checkpoint starting: immediate force wait time
ERROR: could not write to file "pg_xlog/xlogtemp.4380": Não há
espaço disponível no dispositivo
ERROR: checkpoint request failed
HINT: Consult recent messages in the server log for details.
STATEMENT: CHECKPOINT ;
...
ERROR: could not extend file "base/16384/16405": wrote only 4096 of
8192 bytes at block 10
HINT: Check free disk space.
STATEMENT: INSERT INTO pgbench_history (tid, bid, aid, delta,
mtime) VALUES (69, 3, 609672, -3063, CURRENT_TIMESTAMP);
PANIC: could not write to file "pg_xlog/xlogtemp.4528": Não há
espaço disponível no dispositivo
STATEMENT: END;
LOG: server process (PID 4528) was terminated by signal 6: Aborted

Then I freed some space and started it again and the server ran properly:

LOG: database system was shut down at 2011-10-05 00:46:33 BRT
LOG: database system is ready to accept connections
LOG: autovacuum launcher started
...
LOG: checkpoint starting: immediate force wait
LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 transaction
log file(s) added, 0 removed, 0 recycled; write=0.000 s, sync=0.000 s,
total=0.181 s; sync files=0, longest=0.000 s, average=0.000 s

2.2 Running a pgbench and interrupting postmaster a few seconds later,
seems to work as expected, returning the output:

... cut ...
LOG: statement: SELECT abalance FROM pgbench_accounts WHERE aid = 148253;
^CLOG: statement: UPDATE pgbench_tellers SET tbalance = tbalance +
934 WHERE tid = 85;
DEBUG: postmaster received signal 2
LOG: received fast shutdown request
LOG: aborting any active transactions
FATAL: terminating connection due to administrator command
FATAL: terminating connection due to administrator command
... cut ...
LOG: disconnection: session time: 0:00:14.917 user=guedes
database=bench host=[local]
LOG: disconnection: session time: 0:00:14.773 user=guedes
database=bench host=[local]
DEBUG: server process (PID 1910) exited with exit code 1
DEBUG: server process (PID 1941) exited with exit code 1
LOG: shutting down
LOG: checkpoint starting: shutdown immediate
DEBUG: SlruScanDirectory invoking callback on pg_multixact/offsets/0000
DEBUG: SlruScanDirectory invoking callback on pg_multixact/members/0000
DEBUG: checkpoint sync: number=1 file=base/16384/16398 time=1075.227 msec
DEBUG: checkpoint sync: number=2 file=base/16384/16406 time=16.832 msec
DEBUG: checkpoint sync: number=3 file=base/16384/16397 time=12306.204 msec
DEBUG: checkpoint sync: number=4 file=base/16384/16397.1 time=2122.141 msec
DEBUG: checkpoint sync: number=5 file=base/16384/16406_fsm time=32.278 msec
DEBUG: checkpoint sync: number=6 file=base/16384/16385_fsm time=11.248 msec
DEBUG: checkpoint sync: number=7 file=base/16384/16388 time=11.083 msec
DEBUG: checkpoint sync: number=8 file=base/16384/11712 time=11.314 msec
DEBUG: checkpoint sync: number=9 file=base/16384/16397_vm time=11.103 msec
DEBUG: checkpoint sync: number=10 file=base/16384/16385 time=19.308 msec
DEBUG: attempting to remove WAL segments older than log file
000000010000000000000000
DEBUG: SlruScanDirectory invoking callback on pg_subtrans/0000
LOG: checkpoint complete: wrote 1659 buffers (54.0%); 0 transaction
log file(s) added, 0 removed, 0 recycled; write=0.025 s, sync=15.617
s, total=15.898 s; sync files=10, longest=12.306 s, average=1.561 s
LOG: database system is shut down

Then I started the server again and it ran properly.

Well, all the tests was running with the default postgresql.conf in my
laptop but I'll setup a more "real world" environment to test for
performance regression. Until now I couldn't notice any significant
difference in TPS before and after patch in a small environment. I'll
post something soon.

Best regards,
--
Dickson S. Guedes
mail/xmpp: guedes(at)guedesoft(dot)net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message daveg 2011-10-05 04:12:22 timezone buglet?
Previous Message Greg Stark 2011-10-05 02:55:28 Re: Inlining comparators as a performance optimisation