Re: Reducing the cost of sinval messaging

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing the cost of sinval messaging
Date: 2014-10-27 21:18:34
Message-ID: CA+Tgmob1qbtJoGNJVKYO3M4VpKHeEn4NxX+PcW49xM5-0ShXsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 27, 2014 at 4:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Oct 27, 2014 at 3:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Why could we not remove the hasMessages flags again, and change the
>>> unlocked test
>>>
>>> if (!stateP->hasMessages)
>>> return 0;
>>>
>>> into
>>>
>>> if (stateP->nextMsgNum == segP->maxMsgNum &&
>>> !stateP->resetState)
>>> return 0;
>>>
>>> If we are looking at stale shared state, this test might produce a
>>> false positive, but I don't see why it's any less safe than testing a
>>> hasMessages boolean.
>
>> It was discussed at the time:
>
>> http://www.postgresql.org/message-id/CA+TgmoY3Q56ZR6i8h+iGhXCw6rCZyvdWJ3RQT=PMVev4-=+N_g@mail.gmail.com
>> http://www.postgresql.org/message-id/13077.1311702188@sss.pgh.pa.us
>
> Neither of those messages seem to me to bear on this point. AFAICS,
> the unlocked hasMessages test has a race condition, which the comment
> just above it argues isn't a problem in practice. What I propose above
> would have an absolutely indistinguishable race condition, which again
> wouldn't be a problem in practice if you believe the comment's argument.

Yes, the read from hasMessages has a race condition, which is in fact
not a problem in practice if you believe the comment's argument.
However, your proposed formulation has a second race condition, which
is discussed in those emails, and which is precisely why we rejected
the design you're now proposing three years ago. That second race
condition is that the read from stateP->nextMsgNum doesn't have to
happen at the same time as the read from stateP->resetState. No CPU or
compiler barrier separates them, so either can happen before the
other. SICleanupQueue updates both values, so you have the problem
first described precisely by Noah:

# (In theory, you could
# hit a case where the load of resetState gives an ancient "false" just as the
# counters wrap to match.

While it practically seems very unlikely that this would ever really
happen, some guy named Tom Lane told us we should worry about it, so I
did. That led to this new design:

http://www.postgresql.org/message-id/CA+TgmobefJvyQDjVBb6TSs996s8ZKvyRzTtPx0ChYo3hve52tg@mail.gmail.com

...which is what ended up getting committed.

Personally, I think trying to further optimize this is most likely a
waste of time. I'm quite sure that there's a whole lot of stuff in
the sinval code that could be further optimized, but it executes so
infrequently that I believe it doesn't matter. Noah beat on this
pretty hard at the time and found that O(N) loop you're complaining
about - in his testing - never went above 0.26% of CPU time and
reduced overall CPU usage even in a sinval-intensive workload with 50
clients per core:

http://www.postgresql.org/message-id/20110729030514.GB30354@tornado.leadboat.com

If you've got some evidence that this is a real problem as opposed to
a hypothetical one, we could certainly reopen the debate about whether
ignoring the possibility that the loads from stateP->nextMsgNum and
stateP->resetState will be widely-enough spaced in time for a
quarter-million messages to be queued up in the meantime is
sufficiently unlikely to ignore. However, absent evidence of a
tangible performance problem, I'd be quite disinclined to throw the
guaranteed safety of the current approach out the window. We could do
some testing where we crank that 262,144 value down to progressively
lower numbers until the race condition actually manifests, just to see
how far we are from disaster. But presumably the goalposts there
shift every time we get a new generation of hardware; and I don't much
like coding on the basis of "eh, seems unlikely the CPU will ever
delay a read by *that* much."

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-10-27 21:51:44 Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Previous Message Adam Brightwell 2014-10-27 20:33:45 Re: alter user/role CURRENT_USER