Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
Date: 2012-06-14 20:51:09
Message-ID: 201206142251.10078.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, June 14, 2012 08:50:51 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > This patch is problematic because formally indexes used by syscaches
> > needs to be unique, this one is not though because of 0/InvalidOids
> > entries for nailed/shared catalog entries. Those values aren't allowed
> > to be queried though.
> That's not the only reason it's not unique. Take a look at
> GetRelFileNode(). We really only guarantee that <database OID,
> tablespace OID, relfilenode, backend-ID>, taken as a four-tuple, is
> unique. You could have the same relfilenode in different tablespaces,
> or even within the same tablespace with different backend-IDs. The
> latter might not matter for you because you're presumably disregarding
> temp tables, but the former probably does. It's an uncommon scenario
> because we normally set relid = relfilenode, and of course relid is
> unique across the database, but the table gets rewritten then you end
> up with relid != relfilenode, and I don't think there's anything at
> that point that will prevent the new relfilenode from being chosen as
> some other relations relfilenode, as long as it's in a different
> tablespace.
>
> I think the solution may be to create a specialized cache for this,
> rather than relying on the general syscache infrastructure. You might
> look at, e.g., attoptcache.c for an example. That would allow you to
> build a cache that is aware of things like the relmapper
> infrastructure, and the fact that temp tables are ignorable for your
> purposes. But I think you will need to include at least the
> tablespace OID in the key along with the relfilenode to make it
> bullet-proof.
Yes, the tablespace OID should definitely be in there. Need to read up on the
details of an own cache. Once more I didn't want to put in more work before
discussing it here.

> I haven't read through the patch series far enough to know what this
> is being used for yet, but my fear is that you're using it to handle
> mapping a relflenode extracted from the WAL stream back to a relation
> OID. The problem with that is that relfilenode assignments obey
> transaction semantics. So, if someone begins a transaction, truncates
> a table, inserts a tuple, and commits, the heap_insert record is going
> to refer to a relfilenode that, according to the system catalogs,
> doesn't exist. This is similar to one of the worries in my other
> email, so I won't belabor the point too much more here...
Well, yes. We *need* to do the mapping back from the relfilenode to a table.
The idea is that by the fact that the receiving side, be it a full cluster or
just a catalog one, has a fully synchronized catalog in which DDL gets applied
correctly, inside the transaction just as in the sending side, it should never
be wrong to do that mapping.
It probably is necessary to make the syscache lookup/infrastructure to use an
MVCCish snapshot though. No idea how hard that would be yet. Might be a good
argument for your argument of using a specialized cache.

Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I
am aware of the issues with rollbacks, truncate et al...

Thanks,

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-14 21:00:54 Re: [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode
Previous Message Andres Freund 2012-06-14 20:13:17 Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture