Re: MVCC catalog access

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: MVCC catalog access
Date: 2013-07-02 14:38:17
Message-ID: CA+Tgmoaug2vg569gYhJd7Q7bb8NEL=SB51V=bGak2p5gp+mR0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 2, 2013 at 9:52 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>> > * possibly paranoid, but I'd add a Assert to heap_beginscan_catalog or
>> > GetCatalogSnapshot ensuring we're dealing with a catalog relation. The
>> > consistency mechanisms in GetCatalogSnapshot() only work for those, so
>> > there doesn't seem to be a valid usecase for non-catalog relations.
>>
>> It'll actually work find as things stand; it'll just take a new
>> snapshot every time.
>
> Ok. Doesn't really change my opinion that it's a crappy idea to use it
> otherwise ;)

I agree, but I don't see an easy way to write the assertion you want
using only the OID.

>> - In genam.c and execUtils.c, we treat SnapshotNow as a kind of
>> default snapshot. That seems like a crappy idea. I propose that we
>> either set that pointer to NULL and let the server core dump if the
>> snapshot doesn't get set or (maybe better) add a new special snapshot
>> called SnapshotError which just errors out if you try to use it for
>> anything, and initialize to that.
>
> I vote for SnapshotError.

OK.

>> - I'm not quite sure what to do about get_actual_variable_range().
>> Taking a new MVCC snapshot there seems like it might be pricey on some
>> workloads. However, I wonder if we could use SnapshotDirty.
>> Presumably, even uncommitted tuples affect the amount of
>> index-scanning we have to do, so that approach seems to have some
>> theoretical justification. But I'm worried there could be unfortunate
>> consequences to looking at uncommitted data, either now or in the
>> future. SnapshotSelf seems less likely to have that problem, but
>> feels wrong somehow.
>
> I don't like using SnapshotDirty either. Can't we just use the currently
> active snapshot? Unless I miss something this always will be called
> while we have one since when we plan we've done an explicit
> PushActiveSnapshot() and if we need to replan stuff during execution
> PortalRunSelect() will have pushed one.

We could certainly do that, but I'd be a bit reluctant to do so
without input from Tom. I imagine there might be cases where it could
cause a regression.

>> - currtid_byreloid() and currtid_byrelname() use SnapshotNow as an
>> argument to heap_get_latest_tid(). I don't know what these functions
>> are supposed to be good for, but taking a new snapshot for every
>> function call seems to guarantee that someone will find a way to use
>> these functions as a poster child for how to brutalize PGXACT, so I
>> don't particularly want to do that.
>
> Heikki mentioned that at some point they were added for the odbc
> driver. I am not particularly inclined to worry about taking too many
> snapshots here.

Well, if it uses it with any regularity I think it's a legitimate
concern. We have plenty of customers who use ODBC and some of them
allow frightening numbers of concurrent server connections. Now you
may say that's a bad idea, but so is 1000 backends doing
txid_current() in a loop.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-07-02 14:38:54 Re: New regression test time
Previous Message Bruce Momjian 2013-07-02 14:36:34 Re: Large object + FREEZE?