Re: Proposal for CSN based snapshots

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Ants Aasma <ants(at)cybertec(dot)at>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Markus Wanner <markus(at)bluegap(dot)ch>
Subject: Re: Proposal for CSN based snapshots
Date: 2014-05-12 15:01:59
Message-ID: 5370E267.2020904@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/12/2014 05:41 PM, Andres Freund wrote:
> I haven't fully thought it through but I think it should make some of
> the decoding code simpler. And it should greatly simplify the hot
> standby code.

Cool. I was worried it might conflict with the logical decoding stuff in
some fundamental way, as I'm not really familiar with it.

> Some of the stuff in here will be influence whether your freezing
> replacement patch gets in. Do you plan to further pursue that one?

Not sure. I got to the point where it seemed to work, but I got a bit of
a cold feet proceeding with it. I used the page header's LSN field to
define the "epoch" of the page, but I started to feel uneasy about it. I
would be much more comfortable with an extra field in the page header,
even though that uses more disk space. And requires dealing with pg_upgrade.

>> The core of the design is to store the LSN of the commit record in pg_clog.
>> Currently, we only store 2 bits per transaction there, indicating if the
>> transaction committed or not, but the patch will expand it to 64 bits, to
>> store the LSN. To check the visibility of an XID in a snapshot, the XID's
>> commit LSN is looked up in pg_clog, and compared with the snapshot's LSN.
>
> We'll continue to need some of the old states? You plan to use values
> that can never be valid lsns for them? I.e. 0/0 IN_PROGRESS, 0/1 ABORTED
> etc?

Exactly.

Using 64 bits per XID instead of just 2 will obviously require a lot
more disk space, so we might actually want to still support the old clog
format too, as an "archive" format. The clog for old transactions could
be converted to the more compact 2-bits per XID format (or even just 1 bit).

> How do you plan to deal with subtransactions?

pg_subtrans will stay unchanged. We could possibly merge it with
pg_clog, reserving some 32-bit chunk of values that are not valid LSNs
to mean an uncommitted subtransaction, with the parent XID. That assumes
that you never need to look up the parent of an already-committed
subtransaction. I thought that was true at first, but I think the SSI
code looks up the parent of a committed subtransaction, to find its
predicate locks. Perhaps it could be changed, but seems best to leave it
alone for now; there will be a lot code churn anyway.

I think we can get rid of the sub-XID array in PGPROC. It's currently
used to speed up TransactionIdIsInProgress(), but with the patch it will
no longer be necessary to call TransactionIdIsInProgress() every time
you check the visibility of an XID, so it doesn't need to be so fast
anymore.

With the new "commit-in-progress" status in clog, we won't need the
sub-committed clog status anymore. The "commit-in-progress" status will
achieve the same thing.

>> Currently, before consulting the clog for an XID's status, it is necessary
>> to first check if the transaction is still in progress by scanning the proc
>> array. To get rid of that requirement, just before writing the commit record
>> in the WAL, the backend will mark the clog slot with a magic value that says
>> "I'm just about to commit". After writing the commit record, it is replaced
>> with the record's actual LSN. If a backend sees the magic value in the clog,
>> it will wait for the transaction to finish the insertion, and then check
>> again to get the real LSN. I'm thinking of just using XactLockTableWait()
>> for that. This mechanism makes the insertion of a commit WAL record and
>> updating the clog appear atomic to the rest of the system.
>
> So it's quite possible that clog will become more of a contention point
> due to the doubled amount of writes.

Yeah. OTOH, each transaction will take more space in the clog, which
will spread the contention across more pages. And I think there are ways
to mitigate contention in clog, if it becomes a problem. We could make
the locking more fine-grained than one lock per page, use atomic 64-bit
reads/writes on platforms that support it, etc.

>> In theory, we could use a snapshot LSN as the cutoff-point for
>> HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
>> that makes me feel uneasy.
>
> It'd possibly also end up being less efficient because you'd visit the
> clog for potentially quite some transactions to get the LSN.

True.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-05-12 15:05:32 Re: cannot to compile PL/V8 on Fedora 20
Previous Message Tom Lane 2014-05-12 14:42:12 Re: wrapping in extended mode doesn't work well with default pager