Group Commit

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Group Commit
Date: 2007-03-29 10:52:15
Message-ID: 460B9A5F.1090708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been working on the patch to enhance our group commit behavior. The
patch is a dirty hack at the moment, but I'm settled on the algorithm
I'm going to use and I know the issues involved.

Here's the patch as it is if you want to try it out:
http://community.enterprisedb.com/groupcommit-pghead-2.patch

but it needs a rewrite before being accepted. It'll only work on systems
that use sysv semaphores, I needed to add a function to acquire a
semaphore with timeout and I only did it for sysv_sema.c for now.

What are the chances of getting this in 8.3, assuming that I rewrite and
submit a patch within the next week or two?

Algorithm
---------

Instead of starting a WAL flush immediately after a commit record is
inserted, we wait a while to give other backends a chance to finish
their transactions and have them flushed by the same fsync call. There's
two things we can control: how many commits to wait for (commit group
size), and for how long (timeout).

We try to estimate the optimal commit group size. The estimate is

commit group size = (# of commit records flushed + # of commit records
arrived while fsyncing).

This is a relatively simple estimate that works reasonably well with
very short transactions, and the timeout limits the damage when the
estimate is not working.

There's a lot more factors we could take into account in the estimate,
for example:
- # of backends and their states (affects how many are likely to commit
soon)
- amount of WAL written since last XLogFlush (affects the duration of fsync)
- when exaclty the commit records arrive (we don't want to wait 10 ms to
get one more commit record in, when an fsync takes 11 ms)

but I wanted to keep this simple for now.

The timeout is currently hard-coded at 1 ms. I wanted to keep it short
compared to the time it takes to fsync (somewhere in the 5-15 ms
depending on hardware), to limit the damage when the algorithm isn't
getting the estimate right. We could also vary the timeout, but I'm not
sure how to calculate the optimal value and the real granularity will
depend on the system anyhow.

Implementation
--------------

To count the # of commits since last XLogFlush, I added a new
XLogCtlCommit struct in shared memory:

typedef struct XLogCtlCommit
{
slock_t commit_lock; /* protects the struct */
int commitCount; /* # of commit records inserted since
XLogFlush */
int groupSize; /* current commit group size */
XLogRecPtr lastCommitPtr; /* location of the latest commit record */
PGPROC *waiter; /* process to signal when groupSize is
reached */
} XLogCtlCommit;

Whenever a commit record is inserted in XLogInsert, commitCount is
incremented and lastCommitPtr is updated.
When it reaches groupSize, the waiter-process is woken up.

In XLogFlush, after acquiring WALWriteLock, we wait until groupSize is
reached (or timeout expires) before doing the flush.

Instead of the current logic to flush as much WAL as possible, we flush
up to the last commit record. Flushing any more wouldn't save us an
fsync later on, but might make the current fsync take longer. By doing
that, we avoid the conditional acquire of the WALInsertLock that's in
there currently. We make note of commitCount before starting the fsync;
that's the # of commit records that arrived in time so that the fsync
will flush them. Let's call that value "intime".

After the fsync is finished, we update the groupSize for the next round.
The new groupSize is the current commitCount after the fsync, IOW the
number of commit records arrived after the previous XLogFlush, including
the time it took to do the fsync. We update the commitCount by
decrementing it by "intime".

Now we're ready for the next round, and we can release WALWriteLock.

WALWriteLock
------------

The above would work nicely, except that a normal lwlock doesn't play
nicely. You can release and reacquire a lightwait lock in the same time
slice even when there's other backends queuing for the lock, effectively
cutting the queue.

Here's what sometimes happens, with 2 clients:

Client 1 Client 2
do work do work
insert commit record insert commit record
acquire WALWriteLock
try to acquire WALWriteLock, blocks
fsync
release WALWriteLock
begin new transaction
do work
insert commit record
reacquire WALWriteLock
wait for 2nd commit to arrive

Client 1 will eventually time out and commit just its own commit record.
Client 2 should be released immediately after client 1 releases the
WALWriteLock. It only needs to observe that its commit record has
already been flushed and doesn't need to do anything.

To fix the above, and other race conditions like that, we need a
specialized WALWriteLock that orders the waiters by the commit record
XLogRecPtrs. WALWriteLockRelease wakes up all waiters that have their
commit record already flushed. They will just fall through without
acquiring the lock.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-03-29 11:29:22
Message-ID: 460BA312.7010006@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> What are the chances of getting this in 8.3, assuming that I rewrite and
> submit a patch within the next week or two?

I also intend to do performance testing with different workloads to
ensure the patch doesn't introduce a performance regression under some
conditions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-03-29 16:26:04
Message-ID: 23398.1175185564@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> I've been working on the patch to enhance our group commit behavior. The
> patch is a dirty hack at the moment, but I'm settled on the algorithm
> I'm going to use and I know the issues involved.
> ...
> The timeout is currently hard-coded at 1 ms.

This is where my bogometer triggered. There's way too many platforms
where 1 msec timeout is a sheer fantasy. If you cannot make it perform
well with a 10-msec timeout then I don't think it's going to be at all
portable.

Now I know that newer Linux kernels tend to ship with 1KHz scheduler
tick rate, so there's a useful set of platforms where you could make it
work even so, but I'm not really satisfied with saying "this facility is
only usable if you have a fast kernel tick rate" ...

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-03-29 16:41:46
Message-ID: 460BEC4A.1070904@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> I've been working on the patch to enhance our group commit behavior. The
>> patch is a dirty hack at the moment, but I'm settled on the algorithm
>> I'm going to use and I know the issues involved.
>> ...
>> The timeout is currently hard-coded at 1 ms.
>
> This is where my bogometer triggered. There's way too many platforms
> where 1 msec timeout is a sheer fantasy. If you cannot make it perform
> well with a 10-msec timeout then I don't think it's going to be at all
> portable.
>
> Now I know that newer Linux kernels tend to ship with 1KHz scheduler
> tick rate, so there's a useful set of platforms where you could make it
> work even so, but I'm not really satisfied with saying "this facility is
> only usable if you have a fast kernel tick rate" ...

The 1 ms timeout isn't essential for the algorithm. In fact, I chose it
arbitrarily; in the quick tests I did the length of the timeout didn't
seem to matter much. I'm running with CONFIG_HZ=250 kernel myself, which
means that the timeout is really 4 ms on my laptop.

I suspect the tick rate largely explains why the current commit_delay
isn't very good is that even though you specify it in microseconds, it
really waits a lot longer. With the proposed algorithm, the fsync is
started immediately when enough commit records have been inserted, so
the timeout only comes into play when the estimate for the group size is
too high.

With a higher-precision timer, we could vary not only the commit group
size but also the timeout.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-03-29 16:57:29
Message-ID: 23792.1175187449@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> This is where my bogometer triggered. There's way too many platforms
>> where 1 msec timeout is a sheer fantasy. If you cannot make it perform
>> well with a 10-msec timeout then I don't think it's going to be at all
>> portable.

> The 1 ms timeout isn't essential for the algorithm.

OK, but when you get to performance testing, please see how well it
works at CONFIG_HZ=100.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-04-09 22:52:19
Message-ID: 15081.1176159139@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> I've been working on the patch to enhance our group commit behavior. The
> patch is a dirty hack at the moment, but I'm settled on the algorithm
> I'm going to use and I know the issues involved.

One question that just came to mind is whether Simon's no-commit-wait
patch doesn't fundamentally alter the context of discussion for this.
Aside from the prospect that people won't really care about group commit
if they can just use the periodic-WAL-sync approach, ISTM that one way
to get group commit is to just make everybody wait for the dedicated
WAL writer to write their commit record. With a sufficiently short
delay between write/fsync attempts in the background process, won't
that net out at about the same place as a complicated group-commit
patch?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-04-10 01:04:07
Message-ID: 200704100104.l3A147x16083@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> > I've been working on the patch to enhance our group commit behavior. The
> > patch is a dirty hack at the moment, but I'm settled on the algorithm
> > I'm going to use and I know the issues involved.
>
> One question that just came to mind is whether Simon's no-commit-wait
> patch doesn't fundamentally alter the context of discussion for this.
> Aside from the prospect that people won't really care about group commit
> if they can just use the periodic-WAL-sync approach, ISTM that one way
> to get group commit is to just make everybody wait for the dedicated
> WAL writer to write their commit record. With a sufficiently short
> delay between write/fsync attempts in the background process, won't
> that net out at about the same place as a complicated group-commit
> patch?

This is a good point. commit_delay was designed to allow multiple
transactions to fsync with a single fsync. no-commit-wait is going to
do this much more effectively (the client doesn't have to wait for the
other transations). The one thing commit_delay gives us that
no-commit-wait does not is the guarantee that a commit returned to the
client is on disk, without any milliseconds delay.

The big question is who is going to care about the milliseconds delay
and is using a configuration that is going to benefit from commit_delay.
Basically, commit_delay always had a very limited use-case, but now
with no-commit-wait, commit_delay has an even smaller use-case.

I think the big question is whether commit_delay is ever going to be
generally useful.

I tried to find out what release commit_delay was added, and remembered
that the feature was so questionable we did not mention its addition in
the 7.1 release notes. After six years, we are still unsure about the
feature. Another big question is whether commit_delay is _ever_ going
to be useful, and with no-commit-wait being added, commit_delay looks
even more questionable and perhaps it should just be removed in 8.3.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-04-10 03:14:25
Message-ID: Pine.GSO.4.64.0704092249360.21736@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 9 Apr 2007, Bruce Momjian wrote:

> The big question is who is going to care about the milliseconds delay
> and is using a configuration that is going to benefit from commit_delay.

I care. WAL writes are a major bottleneck when many clients are
committing near the same time. Both times I've played with the
commit_delay settings I found it improved the peak throughput under load
at an acceptable low cost in latency. I'll try to present some numbers on
that when I get time, before you make me cry by taking it away.

An alternate mechanism that tells the client the commit is done when it
hasn't hit disk is of no use for the applications I work with, so I
haven't even been paying attention to no-commit-wait.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-04-10 03:28:59
Message-ID: 16975.1176175739@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> An alternate mechanism that tells the client the commit is done when it
> hasn't hit disk is of no use for the applications I work with, so I
> haven't even been paying attention to no-commit-wait.

Agreed, if you need "committed" to mean "committed" then no-wait isn't
going to float your boat. But the point I was making is that the
infrastructure Simon proposes (ie, a separate wal-writer process)
might be useful for this case too, with a lot less extra code than
Heikki is thinking about. Now maybe that won't work, but we should
certainly not consider these as entirely-independent patches.

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: gsmith(at)gregsmith(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Group Commit
Date: 2007-04-10 04:50:35
Message-ID: 20070410.135035.35524819.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On Mon, 9 Apr 2007, Bruce Momjian wrote:
>
> > The big question is who is going to care about the milliseconds delay
> > and is using a configuration that is going to benefit from commit_delay.
>
> I care. WAL writes are a major bottleneck when many clients are
> committing near the same time. Both times I've played with the
> commit_delay settings I found it improved the peak throughput under load
> at an acceptable low cost in latency. I'll try to present some numbers on
> that when I get time, before you make me cry by taking it away.

Totally agreed here. I experienced throughput improvement by using
commit_delay too.

> An alternate mechanism that tells the client the commit is done when it
> hasn't hit disk is of no use for the applications I work with, so I
> haven't even been paying attention to no-commit-wait.

Agreed too.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-04-10 10:01:21
Message-ID: E1539E0ED7043848906A8FF995BDA57901E7B9FB@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> > > I've been working on the patch to enhance our group commit
behavior.
> > > The patch is a dirty hack at the moment, but I'm settled on the
> > > algorithm I'm going to use and I know the issues involved.
> >
> > One question that just came to mind is whether Simon's
no-commit-wait
> > patch doesn't fundamentally alter the context of discussion for
this.
> > Aside from the prospect that people won't really care about group
> > commit if they can just use the periodic-WAL-sync approach, ISTM
that
> > one way to get group commit is to just make everybody wait for the
> > dedicated WAL writer to write their commit record.

Yes good catch, I think we will want to merge the two.
But, you won't want to wait indefinitely, since imho the dedicated WAL
writer will primarily only want to write/flush full WAL pages. Maybe
flush half full WAL pages only after some longer timeout. But basically
this timeout should be longer than an individual backend is willing to
delay their commit.

> > With a
> > sufficiently short delay between write/fsync attempts in the
> > background process, won't that net out at about the same place as a
> > complicated group-commit patch?

I don't think we want the delay so short, or we won't get any grouped
writes.

I think what we could do is wait up to commit_delay for the
dedicated WAL writer to do it's work. If it did'nt do it until timeout
let the backend do the flushing itself.

> I think the big question is whether commit_delay is ever going to be
generally useful.

It is designed to allow a higher transaction/second rate on a constantly
WAL bottlenecked system, so I think it still has a use case. I think you
should not compare it to no-commit-wait from the feature side (only
implementation).

Andreas


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-04-10 10:40:58
Message-ID: 461B69BA.3080907@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> I've been working on the patch to enhance our group commit behavior. The
>> patch is a dirty hack at the moment, but I'm settled on the algorithm
>> I'm going to use and I know the issues involved.
>
> One question that just came to mind is whether Simon's no-commit-wait
> patch doesn't fundamentally alter the context of discussion for this.
> Aside from the prospect that people won't really care about group commit
> if they can just use the periodic-WAL-sync approach, ISTM that one way
> to get group commit is to just make everybody wait for the dedicated
> WAL writer to write their commit record. With a sufficiently short
> delay between write/fsync attempts in the background process, won't
> that net out at about the same place as a complicated group-commit
> patch?

Possibly. To get efficient group commit there would need to be some kind
of signaling between the WAL writer and normal backends. I think there
is some in the patch, but I'm not sure if it gives efficient group
commit. A constant delay will just give us something similar to
commit_delay.

I've refrained from spending time on group commit until the
commit-no-wait patch lands, because it's going to conflict anyway. I'm
starting to feel we should not try to rush group commit into 8.3, unless
it somehow falls out of the commit-no-wait patch by accident, given that
we're past feature freeze and coming up with a proper group commit
algorithm would need a lot of research and testing. Better do it for 8.4
with more time, we've got enough features on plate for 8.3 anyway.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-04-10 15:32:12
Message-ID: 26561.1176219132@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> I've refrained from spending time on group commit until the
> commit-no-wait patch lands, because it's going to conflict anyway. I'm
> starting to feel we should not try to rush group commit into 8.3, unless
> it somehow falls out of the commit-no-wait patch by accident, given that
> we're past feature freeze and coming up with a proper group commit
> algorithm would need a lot of research and testing. Better do it for 8.4
> with more time, we've got enough features on plate for 8.3 anyway.

It's possible that it *would* fall out of commit-no-wait, if we are
alert to the possibility of shaking the tree in the right direction ;-)
Otherwise I agree with waiting till 8.4 to deal with it.

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2007-04-13 14:33:18
Message-ID: 1176474798.3635.119.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2007-04-10 at 11:40 +0100, Heikki Linnakangas wrote:
> Tom Lane wrote:
> > Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> >> I've been working on the patch to enhance our group commit behavior. The
> >> patch is a dirty hack at the moment, but I'm settled on the algorithm
> >> I'm going to use and I know the issues involved.
> >
> > One question that just came to mind is whether Simon's no-commit-wait
> > patch doesn't fundamentally alter the context of discussion for this.

I was certainly intending that it would.

> > Aside from the prospect that people won't really care about group commit
> > if they can just use the periodic-WAL-sync approach, ISTM that one way
> > to get group commit is to just make everybody wait for the dedicated
> > WAL writer to write their commit record. With a sufficiently short
> > delay between write/fsync attempts in the background process, won't
> > that net out at about the same place as a complicated group-commit
> > patch?
>
> Possibly. To get efficient group commit there would need to be some kind
> of signaling between the WAL writer and normal backends. I think there
> is some in the patch, but I'm not sure if it gives efficient group
> commit. A constant delay will just give us something similar to
> commit_delay.

Agreed.

> I've refrained from spending time on group commit until the
> commit-no-wait patch lands, because it's going to conflict anyway. I'm
> starting to feel we should not try to rush group commit into 8.3, unless
> it somehow falls out of the commit-no-wait patch by accident, given that
> we're past feature freeze and coming up with a proper group commit
> algorithm would need a lot of research and testing. Better do it for 8.4
> with more time, we've got enough features on plate for 8.3 anyway.

My feeling was that I couldn't get both done for 8.3, and that including
the WAL Writer in 8.3 would make the dev path clearer for a later
attempt upon group commit.

I think it was worth exploring whether it would be easy, but I think we
can see it'll take a lot of work to make it "fly right".

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2008-03-06 22:21:09
Message-ID: 200803062221.m26ML9d02750@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Should we remove these now that we have async commit?

#commit_delay = 0 # range 0-100000, in microseconds
#commit_siblings = 5 # range 1-1000

They seem unfixable.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Tue, 2007-04-10 at 11:40 +0100, Heikki Linnakangas wrote:
> > Tom Lane wrote:
> > > Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> > >> I've been working on the patch to enhance our group commit behavior. The
> > >> patch is a dirty hack at the moment, but I'm settled on the algorithm
> > >> I'm going to use and I know the issues involved.
> > >
> > > One question that just came to mind is whether Simon's no-commit-wait
> > > patch doesn't fundamentally alter the context of discussion for this.
>
> I was certainly intending that it would.
>
> > > Aside from the prospect that people won't really care about group commit
> > > if they can just use the periodic-WAL-sync approach, ISTM that one way
> > > to get group commit is to just make everybody wait for the dedicated
> > > WAL writer to write their commit record. With a sufficiently short
> > > delay between write/fsync attempts in the background process, won't
> > > that net out at about the same place as a complicated group-commit
> > > patch?
> >
> > Possibly. To get efficient group commit there would need to be some kind
> > of signaling between the WAL writer and normal backends. I think there
> > is some in the patch, but I'm not sure if it gives efficient group
> > commit. A constant delay will just give us something similar to
> > commit_delay.
>
> Agreed.
>
> > I've refrained from spending time on group commit until the
> > commit-no-wait patch lands, because it's going to conflict anyway. I'm
> > starting to feel we should not try to rush group commit into 8.3, unless
> > it somehow falls out of the commit-no-wait patch by accident, given that
> > we're past feature freeze and coming up with a proper group commit
> > algorithm would need a lot of research and testing. Better do it for 8.4
> > with more time, we've got enough features on plate for 8.3 anyway.
>
> My feeling was that I couldn't get both done for 8.3, and that including
> the WAL Writer in 8.3 would make the dev path clearer for a later
> attempt upon group commit.
>
> I think it was worth exploring whether it would be easy, but I think we
> can see it'll take a lot of work to make it "fly right".
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2008-03-07 00:35:53
Message-ID: Pine.GSO.4.64.0803061921080.5044@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 6 Mar 2008, Bruce Momjian wrote:

> Should we remove these now that we have async commit?
> #commit_delay = 0 # range 0-100000, in microseconds
> #commit_siblings = 5 # range 1-1000
> They seem unfixable.

commit_delay offers a small but not insignificant improvement for some
people using the feature under bursty, high client loads. The useful
tuning seems to be siblings>[10-20] and a small setting for the delay; I
usually just set it to 1 which gives the minimum the OS is capable of
resolving.

That wasn't the feature's original intention I think, but that's what it's
useful for regardless. As async commit is only applicable in cases where
it's OK to expand the window for transaction loss, removing commit_delay
will cause a small performance regression for users who have tuned it
usefully right now.

I actually have a paper design for something that builds a little model
for how likely it is another commit will be coming soon that essentially
turns this into something that can be tuned automatically, better than any
person can do it. No idea if I'll actually build that thing, but I hope
it's obvious that there's some possibility to improve this area for
applications that can't use async commit. If you're going to dump the
feature, I'd suggest at least waiting until later in the 8.4 cycle to see
if something better comes along first.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2008-03-07 00:57:41
Message-ID: 27983.1204851461@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith <gsmith(at)gregsmith(dot)com> writes:
> I actually have a paper design for something that builds a little model
> for how likely it is another commit will be coming soon that essentially
> turns this into something that can be tuned automatically, better than any
> person can do it. No idea if I'll actually build that thing, but I hope
> it's obvious that there's some possibility to improve this area for
> applications that can't use async commit. If you're going to dump the
> feature, I'd suggest at least waiting until later in the 8.4 cycle to see
> if something better comes along first.

What about the other idea of just having committers wait for the next
walwriter-cycle flush before reporting commit?

regards, tom lane


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Group Commit
Date: 2008-03-07 04:09:46
Message-ID: Pine.GSO.4.64.0803062307510.10742@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 6 Mar 2008, Tom Lane wrote:

> What about the other idea of just having committers wait for the next
> walwriter-cycle flush before reporting commit?

I haven't considered that too much yet; it may very well be superior to
anything I was thinking of. The only point I was trying to make today is
that I'd prefer not to see commit_delay excised until there's a superior
replacement for it that's suitable even for synchronous write situations.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD