Re: FDW for PostgreSQL

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FDW for PostgreSQL
Date: 2013-02-14 04:01:45
Message-ID: 24455.1360814505@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com> writes:
> [ postgres_fdw.v5.patch ]

I started to look at this patch today. There seems to be quite a bit
left to do to make it committable. I'm willing to work on it, but
there are some things that need discussion:

* The code seems to always use GetOuterUserId() to select the foreign
user mapping to use. This seems entirely wrong. For instance it will
do the wrong thing inside a SECURITY DEFINER function, where surely the
relevant privileges should be those of the function owner, not the
session user. I would also argue that if Alice has access to a foreign
table owned by Bob, and Alice creates a view that selects from that
table and grants select privilege on the view to Charlie, then when
Charlie selects from the view the user mapping to use ought to be
Alice's. (If anyone thinks differently about that, speak up!)
To implement that for queries, we need code similar to what
ExecCheckRTEPerms does, ie "rte->checkAsUser ? rte->checkAsUser :
GetUserId()". It's a bit of a pain to get hold of the RTE from
postgresGetForeignRelSize or postgresBeginForeignScan, but it's doable.
(Should we modify the APIs for these functions to make that easier?)
I think possibly postgresAcquireSampleRowsFunc should use the foreign
table's owner regardless of the current user ID - if the user has
permission to run ANALYZE then we don't really want the command to
succeed or fail depending on exactly who the user is. That's perhaps
debatable, anybody have another theory?

* AFAICT, the patch expects to use a single connection for all
operations initiated under one foreign server + user mapping pair.
I don't think this can possibly be workable. For instance, we don't
really want postgresIterateForeignScan executing the entire remote query
to completion and stashing the results locally -- what if that's many
megabytes? It ought to be pulling the rows back a few at a time, and
that's not going to work well if multiple scans are sharing the same
connection. (We might be able to dodge that by declaring a cursor
for each scan, but I'm not convinced that such a solution will scale up
to writable foreign tables, nested queries, subtransactions, etc.)
I think we'd better be prepared to allow multiple similar connections.
The main reason I'm bringing this up now is that it breaks the
assumption embodied in postgres_fdw_get_connections() and
postgres_fdw_disconnect() that foreign server + user mapping can
constitute a unique key for identifying connections. However ...

* I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
to be a bad idea altogether. These connections ought to be a hidden
implementation matter, not something that the user has a view of, much
less control over. Aside from the previous issue, I believe it's a
trivial matter to crash the patch as it now stands by applying
postgres_fdw_disconnect() to a connection that's in active use. I can
see the potential value in being able to shut down connections when a
session has stopped using them, but this is a pretty badly-designed
approach to that. I suggest that we just drop these functions for now
and revisit that problem later. (One idea is some sort of GUC setting
to control how many connections can be held open speculatively for
future use.)

* deparse.c contains a depressingly large amount of duplication of logic
from ruleutils.c, and can only need more as we expand the set of
constructs that can be pushed to the remote end. This doesn't seem like
a maintainable approach. Was there a specific reason not to try to use
ruleutils.c for this? I'd much rather tweak ruleutils to expose some
additional APIs, if that's what it takes, than have all this redundant
logic.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vlad Arkhipov 2013-02-14 04:06:11 Re: Temporal features in PostgreSQL
Previous Message Noah Misch 2013-02-14 02:40:20 Re: BUG #7493: Postmaster messages unreadable in a Windows console