Re: Group Commits Vs WAL Writes

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commits Vs WAL Writes
Date: 2013-06-27 22:32:42
Message-ID: CAMkU=1zSyAuCP6cQ=9NjX-3nwW00qPcAz93JUnEDWBRacVWtmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 27, 2013 at 9:51 AM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:

> >
> > commit_delay exists to artificially increase the window in which the
> > leader backend waits for more group commit followers. At higher client
> > counts, that isn't terribly useful because you'll naturally have
> > enough clients anyway, but at lower client counts particularly where
> > fsyncs have high latency, it can help quite a bit. I mention this
> > because clearly commit_delay is intended to trade off latency for
> > throughput. Although having said that, when I worked on commit_delay,
> > the average and worse-case latencies actually *improved* for the
> > workload in question, which consisted of lots of small write
> > transactions. Though I wouldn't be surprised if you could produce a
> > reasonable case where latency was hurt a bit, but throughput improved.
>

Throughput and average latency are strictly reciprocal, aren't they? I
think when people talk about improving latency, they must mean something
like "improve 95% latency", not average latency. Otherwise, it doesn't
seem to make much sense to me, they are the same thing.

>
> Thanks for your reply.
>
> The logic says that latency will be hit when commit_delay is applied,
> but I am really interested in why we get an improvement instead.
>

There is a spot on the disk to which the current WAL is destined to go.
That spot on the disk is not going to be under the write-head for (say)
another 6 milliseconds.

Without commit_delay, I try to commit my record, but find that someone else
is already on the lock (and on the fsync as well). I have to wait for 6
milliseconds before that person gets their commit done and releases the
lock, then I can start mine, and have to wait another 8 milliseconds (7500
rpm disk) for the spot to come around again, for a total of 14 milliseconds
of latency.

With commit_delay, I get my record in under the nose of the person who is
already doing the delay, and they wake up and flush it for me in time to
make the 6 millisecond cutoff. Total 6 milliseconds latency for me.

One thing I tried a while ago (before the recent group-commit changes were
made) was to record in shared memory when the last fsync finished, and then
the next time someone needed to fsync, they would sleep until just before
the write spot was predicted to be under the write head again
(previous_finish + rotation_time - safety_margin, where rotation_time -
safety_margin were represented by a single guc). It worked pretty well on
the system in which I wrote it, but seemed too brittle to be a general
solution.

Another thing I tried was to drop the WALWriteLock after the WAL write
finished but before calling fsync. The theory was that process 1 could
write its WAL and then block on the fsync, and then process 2 could also
write its WAL and also block directly on the fsync, and the kernel/disk
controller would be smart enough to realize that it could merge the two
pending fsync requests into one. This did not work at all, possibly
because my disk controller was very cheap and not very smart.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-06-27 22:35:20 Re: proposal: enable new error fields in plpgsql (9.4)
Previous Message Alvaro Herrera 2013-06-27 22:21:30 Re: updated emacs configuration