Re: Re: Doc patch making firm recommendation for setting the value of commit_delay

From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
Date: 2013-01-28 00:16:24
Message-ID: CAEYLb_WtKY754jaL+Sf=uYHcHw-dn92kLJCR7OW7EAqpH-FqRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Noah,

On 27 January 2013 02:31, Noah Misch <noah(at)leadboat(dot)com> wrote:
> I did a few more benchmarks along the spectrum.

> So that's a nice 27-53% improvement, fairly similar to the pattern for your
> laptop pgbench numbers.

I presume that this applies to a tpc-b benchmark (the pgbench
default). Note that the really compelling numbers that I reported in
that blog post (where there is an increase of over 80% in transaction
throughput at lower client counts) occur with an insert-based
benchmark (i.e. a maximally commit-bound workload).

> Next, based on your comment about the possible value
> for cloud-hosted applications

> -clients- -tps(at)commit_delay=0- -tps(at)commit_delay=500-
> 32 1224,1391,1584 1175,1229,1394
> 64 1553,1647,1673 1544,1546,1632
> 128 1717,1833,1900 1621,1720,1951
> 256 1664,1717,1918 1734,1832,1918
>
> The numbers are all over the place, but there's more loss than gain.

I suspected that the latency of cloud storage might be relatively
poor. Since that is evidently not actually the case with Amazon EBS,
it makes sense that commit_delay isn't compelling there. I am not
disputing whether or not Amazon EBS should be considered
representative of such systems in general - I'm sure that it should
be.

> There was no appreciable
> performance advantage from setting commit_delay=0 as opposed to relying on
> commit_siblings to suppress the delay. That's good news.

Thank you for doing that research; I investigated that the fastpath in
MinimumActiveBackends() works well myself, but it's useful to have my
findings verified.

> On the GNU/Linux VM, pg_sleep() achieves precision on the order of 10us.
> However, the sleep was consistently around 70us longer than requested. A
> 300us request yielded a 370us sleep, and a 3000us request gave a 3080us sleep.
> Mac OS X was similarly precise for short sleeps, but it could oversleep a full
> 1000us on a 35000us sleep.

Ugh.

> The beginning of this paragraph stills says "commit_delay causes a delay just
> before a synchronous commit attempts to flush WAL to disk". Since it now
> applies to every WAL flush, that should be updated.

Agreed.

> There's a similar problem at the beginning of this paragraph; it says
> specifically, "The commit_delay parameter defines for how many microseconds
> the server process will sleep after writing a commit record to the log with
> LogInsert but before performing a LogFlush."

Right.

> As a side note, if we're ever going to recommend a fire-and-forget method for
> setting commit_delay, it may be worth detecting whether the host sleep
> granularity is limited like this. Setting commit_delay = 20 for your SSD and
> silently getting commit_delay = 10000 would make for an unpleasant surprise.

Yes, it would. Note on possible oversleeping added.

>> ! <para>
>> ! Since the purpose of <varname>commit_delay</varname> is to allow
>> ! the cost of each flush operation to be more effectively amortized
>> ! across concurrently committing transactions (potentially at the
>> ! expense of transaction latency), it is necessary to quantify that
>> ! cost when altering the setting. The higher that cost is, the more
>> ! effective <varname>commit_delay</varname> is expected to be in
>> ! increasing transaction throughput. The
>
> That's true for spinning disks, but I suspect it does not hold for storage
> with internal parallelism, notably virtualized storage. Consider an iSCSI
> configuration with high bandwidth and high latency. When network latency is
> the limiting factor, will sending larger requests less often still help?

Well, I don't like to speculate about things like that, because it's
just too easy to be wrong. That said, it doesn't immediately occur to
me why the statement that you've highlighted wouldn't be true of
virtualised storage that has the characteristics you describe. Any
kind of latency at flush time means that clients idle, which means
that the CPU is potentially not kept fully busy for a greater amount
of wall time, where it might otherwise be kept more busy.

> One would be foolish to run a performance-sensitive workload like those in
> question, including the choice to have synchronous_commit=on, on spinning
> disks with no battery-backed write cache. A cloud environment is more
> credible, but my benchmark showed no gain there.

In an everyday sense you are correct. It would typically be fairly
senseless to run an application that was severely limited by
transaction throughput like this, when a battery-backed cache could be
used at the cost of a couple of hundred dollars. However, it's quite
possible to imagine a scenario in which the economics favoured using
commit_delay instead. For example, I am aware that at Facebook, a
similar Facebook-flavoured-MySQL setting (sync_binlog_timeout_usecs)
is used. Furthermore, it might not be obvious that fsync speed is an
issue in practice. Setting commit_delay to 4,000 has seemingly no
downside on my laptop - it *positively* affects both average and
worse-case transaction latency - so with spinning disks, it probably
would actually be sensible to set it and forget it, regardless of
workload.

When Robert committed this feature, he added an additional check when
WALWriteLock is acquired, that could see the lock acquired in a way
that turned out to be needless, but also prevented a flush that was
technically needless from the group commit leader/lock holder
backend's own selfish perspective. I never got around to satisfying
myself that that changed helped more than it hurt, if in fact it had
any measurable impact either way. Perhaps I should. The benchmark that
appears on my blog was actually produced with the slightly different,
original version.

> Overall, I still won't
> personally recommend changing commit_delay without measuring the performance
> change for one's particular workload and storage environment. commit_delay
> can now bring some impressive gains in the right situations, but I doubt those
> are common enough for a fire-and-forget setting to do more good than harm.

I agree.

> I suggest having the documentation recommend half of the fsync time as a
> starting point for benchmarking different commit_delay settings against your
> own workload. Indicate that it's more likely to help for direct use of
> spinning disks than for BBWC/solid state/virtualized storage. Not sure what
> else can be credibly given as general advice for PostgreSQL DBAs.

That all seems reasonable. The really important thing is that we don't
state that we don't have a clue what helps - that inspires no
confidence, could turn someone off what would be a really useful
feature for them and just isn't accurate. I also think it's important
that we don't say "Setting commit_delay can only help when there are
many concurrently committing transactions", because roughly the
opposite is true. With many connections, there are already enough
committing transactions to effectively amortize the cost of a flush,
and commit_delay is then only very slightly helpful. Lower client
counts are where commit_delay actually helps (at least with slow fsync
times, which are the compelling case).

I attach a revision that I think addresses your concerns. I've
polished it a bit further too - in particular, my elaborations about
commit_delay have been concentrated at the end of wal.sgml, where they
belong. I've also removed the reference to XLogInsert, because, since
all XLogFlush call sites are now covered by commit_delay, XLogInsert
isn't particularly relevant.

I have also increased the default time that pg_test_fsync runs - I
think that the kind of variability commonly seen in its output, that
you yourself have reported, justifies doing so in passing.

--
Regards,
Peter Geoghegan

Attachment Content-Type Size
commit_delay_doc.2013_01_28.patch application/octet-stream 7.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-01-28 03:34:08 Re: Re: Doc patch making firm recommendation for setting the value of commit_delay
Previous Message Andrew Dunstan 2013-01-28 00:11:12 Re: Visual Studio 2012 RC