Re: Small SSI issues

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Small SSI issues
Date: 2011-06-10 18:43:58
Message-ID: 4DF265EE.4040209@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10.06.2011 18:05, Kevin Grittner wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> o There is no safeguard against actually wrapping around the
>> SLRU, just the warning
>
> Any thoughts on what we should do instead? If someone holds open a
> transaction long enough to burn through a billion transaction IDs
> (or possibly less if someone uses a smaller BLCKSZ), should we
> generate a FATAL error?

FATAL is a bit harsh, ERROR seems more appropriate.

>> o I'm suspicious of the OldSerXidPagePrecedesLogically() logic
>> with 32k BLCKSZ. In that case, OLDSERXID_MAX_PAGE is a larger
>> than necessary to cover the whole range of 2^32 transactions,
>> so at high XIDs, say 2^32-1, doesn't it incorrectly think
>> that low XIDs, say 10, are in the past, not the future?
>
> I will look that over to see; but if this is broken, then one of the
> other SLRU users is probably broken, because I think I stole this
> code pretty directly from another spot.

It was copied from async.c, which doesn't have this problem because it's
not mapping XIDs to the slru. There, the max queue size is determined by
the *_MAX_PAGE, and you point to a particular location in the SLRU with
simply page+offset.

>>> /*
>>> * Keep a pointer to the currently-running serializable
>>> * transaction (if any) for quick reference.
>>> * TODO SSI: Remove volatile qualifier and the then-unnecessary
>>> * casts?
>>> */
>>> static volatile SERIALIZABLEXACT *MySerializableXact =
>>> InvalidSerializableXact;
>>
>> * So, should we remove it? In most places where contents of
>> MySerializableXact are modified, we're holding
>> SerializableXactHashLock in exclusive mode. However, there's
>> two exceptions:
>>
>> o GetSafeSnapshot() modifies MySerializableXact->flags without
>> any lock. It also inspects
>> MySerializableXact->possibleUnsafeConflicts without a lock.
>> What if somone sets some other flag in the flags bitmap just
>> when GetSafeSnapshot() is about to set
>> SXACT_FLAG_DEFERRABLE_WAITING? One of the flags might be lost
>> :-(.
>>
>> o CheckForSerializableConflictIn() sets SXACT_FLAG_DID_WRITE
>> without holding a lock. The same danger is here if someone
>> else tries to set some other flag concurrently.
>>
>> I think we should simply acquire SerializableXactHashLock in
>> GetSafeSnapshot(). It shouldn't be so much of a hotspot that it
>> would make any difference in performance.
>> CheckForSerializableConflictIn() might be called a lot,
>> however, so maybe we need to check if the flag is set first,
>> and only acquire the lock and set it if it's not set already.
>
> OK.
>
> Do checks such as that argue for keeping the volatile flag, or do
> you think we can drop it if we make those changes? (That would also
> allow dropping a number of casts which exist just to avoid
> warnings.)

I believe we can drop it, I'll double-check.

> I'm happy to work on modifications for any of this or to stay out of
> your way if you want to. Should I put together a patch for those
> items where we seem to agree and have a clear way forward?

I'll fix the MySerializableXact locking issue, and come back to the
other issues on Monday. If you have the time and energy to write a patch
by then, feel free, but I can look into them otherwise.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2011-06-10 18:53:49 Re: [v9.2] Start new timeline for PITR
Previous Message Tom Lane 2011-06-10 18:38:57 Re: gcc 4.6 and hot standby