Re: [PATCH 01/16] Overhaul walsender wakeup handling

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH 01/16] Overhaul walsender wakeup handling
Date: 2012-06-22 16:35:07
Message-ID: 201206221835.08069.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, June 22, 2012 04:59:45 PM Robert Haas wrote:
> On Fri, Jun 22, 2012 at 10:45 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> >> > the likelihood of that as you know.
> >>
> >> Hmm, well, I guess. I'm still not sure I really understand what
> >> benefit we're getting out of this. If we lose a few WAL records for
> >> an uncommitted transaction, who cares? That transaction is gone
> >> anyway.
> >
> > Well, before the simple fix Simon applied after my initial complaint you
> > didn't get wakeups *at all* in the synchronous_commit=off case.
> >
> > Now, with the additional changes, the walsender is woken exactly when
> > data is available to send and not always when a commit happens. I played
> > around with various scenarios and it always was a win.
>
> Can you elaborate on that a bit? What scenarios did you play around
> with, and what does "win" mean in this context?
I had two machines connected locally and setup HS and my prototype between
them (not at once obviously).
The patch reduced all the average latency between both nodes (measured by
'ticker' rows arriving in a table on the standby), the jitter in latency and
the amount of load I had to put on the master before the standby couldn't keep
up anymore.

I played with different loads:
* multple concurrent ~50MB COPY's
* multple concurrent ~50MB COPY's, pgbench
* pgbench

All three had a ticker running concurrently with synchronous_commit=off
(because it shouldn't cause any difference in the replication pattern itself).

The difference in averagelag and cutoff were smallest with just pgbench running
alone and biggest with COPY running alone. Highjitter was most visible with
just pgbench running alone but thats likely just because the average lag was
smaller.

Its not that surprising imo. On workloads that have a high wal throughput like
all of the above XLogInsert frequently has to write out data itself. If that
happens the walsender might not get waken up in the current setup so the
walsender/receiver pair is inactive and starts to work like crazy afterwards
to catch up. During that period of higher activity it does fsync's of
MAX_SEND_SIZE (16 * XLOG_BLKSZ) in a high rate which reduces the throughput of
apply...

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2012-06-22 16:38:20 Re: libpq compression
Previous Message D'Arcy Cain 2012-06-22 16:28:19 Re: COMMUTATOR doesn't seem to work