Re: max_wal_senders must die

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: max_wal_senders must die
Date: 2010-11-13 22:19:55
Message-ID: AANLkTi=QjQLMSCXcTNhVY1q32=UV3S4vkkFuXD-BJhMq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 13, 2010 at 10:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> One idea I've had is that we might want to think about defining an
>> operation that is effectively "store, with a memory barrier".  For
>> example, on x86, this could be implemented using xchg.  I think if you
>> have a single-word variable in shared memory that is always updated
>> using a locked store, then individual backends should be able to read
>> it unlocked without risk of seeing a stale value.
>
> You're still guilty of fuzzy thinking here.  What does "stale" mean?

Well, that obviously depends on what algorithm you use to add and
remove items from the array. I know my thinking is fuzzy; I was
trying to say "I'm mulling over whether we could do something like
this" rather than "I know exactly how to make this work".

*thinks it over*

Here's an algorithm that might work. Let's assume that we're still
going to allocate an array of size max_wal_senders, but we want to
make max_wal_senders relatively large butl avoid iterating over the
entire array on each commit. Store a variable in shared memory called
FirstWalSenderOffset which is always updating using a memory barrier
operation. This value is -1 if there are no currently connected WAL
senders, or the array slot of a currently connected walsender if there
is one. Each array slot also has a structure member called
NextWalSenderOffset, so that the list of connect WAL senders is
effectively stored as a linked list, but with array slot numbers
rather than pointers.

To add a WAL sender, we initialize the new array slot, setting
NextWalSenderOffset to FirstWalSenderOffset, and then set
FirstWalSenderOffset to the slot number of the newly allocated slot.
To remove a WAL sender, we loop through the notional linked list and,
when we find the "pointer" to the slot we want to remove, we override
it with a pointer to the slot to which the removed entry points. All
of these stores are done using the store-with-memory-barrier, so that
readers can iterate over the linked list without a lock. However, we
can't let two processes try to MODIFY the list at the same time (at
least, not without a more powerful memory ordering primitive; COMPXCHG
might be enough) so just protect list modification with a global
spinlock; updates won't be frequent.

This algorithm makes the assumption that it's OK for a scan to miss
WAL senders that are added mid-scan. But the current code makes that
assumption, too: a new WAL sender could grab an array slot we've
already passed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-11-13 22:26:34 Re: plan time of MASSIVE partitioning ...
Previous Message Tom Lane 2010-11-13 22:07:50 Re: 8.4-vintage problem in postmaster.c