Re: FDW for PostgreSQL

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FDW for PostgreSQL
Date: 2013-02-14 09:13:28
Message-ID: CAEZqfEeFmamNB03enZbhuJEp3j4TpTtbHdgBDmeS2rJ+h5dAHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 14, 2013 at 1:01 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> * The code seems to always use GetOuterUserId() to select the foreign
> user mapping to use. This seems entirely wrong. For instance it will
> do the wrong thing inside a SECURITY DEFINER function, where surely the
> relevant privileges should be those of the function owner, not the
> session user. I would also argue that if Alice has access to a foreign
> table owned by Bob, and Alice creates a view that selects from that
> table and grants select privilege on the view to Charlie, then when
> Charlie selects from the view the user mapping to use ought to be
> Alice's. (If anyone thinks differently about that, speak up!)

Agreed that OuterUserId is wrong for user mapping. Also agreed that
Charlie doesn't need his own mapping for the server, if he is
accessing via a valid view.

> To implement that for queries, we need code similar to what
> ExecCheckRTEPerms does, ie "rte->checkAsUser ? rte->checkAsUser :
> GetUserId()". It's a bit of a pain to get hold of the RTE from
> postgresGetForeignRelSize or postgresBeginForeignScan, but it's doable.
> (Should we modify the APIs for these functions to make that easier?)

This issue seems not specific to postgres_fdw. Currently
GetUserMapping takes userid and server's oid as parameters, but we
would able to hide the complex rule by replacing userid with RTE or
something.

> I think possibly postgresAcquireSampleRowsFunc should use the foreign
> table's owner regardless of the current user ID - if the user has
> permission to run ANALYZE then we don't really want the command to
> succeed or fail depending on exactly who the user is. That's perhaps
> debatable, anybody have another theory?

+1. This allows non-owner to ANALYZE foreign tables without having
per-user mapping, though public mapping also solves this issue.

In implementation level, postgresAcquireSampleRowsFunc has Relation of
the target table, so we can get owner's oid by reading
rd_rel->relowner.

> * AFAICT, the patch expects to use a single connection for all
> operations initiated under one foreign server + user mapping pair.
> I don't think this can possibly be workable. For instance, we don't
> really want postgresIterateForeignScan executing the entire remote query
> to completion and stashing the results locally -- what if that's many
> megabytes?

It uses single-row-mode of libpq and TuplestoreState to keep result
locally, so it uses limited memory at a time. If the result is larger
than work_mem, overflowed tuples are written to temp file. I think
this is similar to materializing query results.

> It ought to be pulling the rows back a few at a time, and
> that's not going to work well if multiple scans are sharing the same
> connection. (We might be able to dodge that by declaring a cursor
> for each scan, but I'm not convinced that such a solution will scale up
> to writable foreign tables, nested queries, subtransactions, etc.)

Indeed the FDW used CURSOR in older versions. Sorry for that I have
not looked writable foreign table patch closely yet, but it would
require (may be multiple) remote update query executions during
scanning?

> I think we'd better be prepared to allow multiple similar connections.
> The main reason I'm bringing this up now is that it breaks the
> assumption embodied in postgres_fdw_get_connections() and
> postgres_fdw_disconnect() that foreign server + user mapping can
> constitute a unique key for identifying connections. However ...

Main reason to use single connection is to make multiple results
retrieved from same server in a local query consistent. Shared
snapshot might be helpful for this consistency issue, but I've not
tried that with FDW.

> * I find postgres_fdw_get_connections() and postgres_fdw_disconnect()
> to be a bad idea altogether. These connections ought to be a hidden
> implementation matter, not something that the user has a view of, much
> less control over. Aside from the previous issue, I believe it's a
> trivial matter to crash the patch as it now stands by applying
> postgres_fdw_disconnect() to a connection that's in active use. I can
> see the potential value in being able to shut down connections when a
> session has stopped using them, but this is a pretty badly-designed
> approach to that. I suggest that we just drop these functions for now
> and revisit that problem later. (One idea is some sort of GUC setting
> to control how many connections can be held open speculatively for
> future use.)

Actually these functions follows dblink's similar functions, but
having them is a bad decision because FDW can't connect explicitly.
As you mentioned, postgres_fdw_disconnect is provided for clean
shutdown on remote side (I needed it in my testing).

I agree that separate the issue from FDW core.

> * deparse.c contains a depressingly large amount of duplication of logic
> from ruleutils.c, and can only need more as we expand the set of
> constructs that can be pushed to the remote end. This doesn't seem like
> a maintainable approach. Was there a specific reason not to try to use
> ruleutils.c for this? I'd much rather tweak ruleutils to expose some
> additional APIs, if that's what it takes, than have all this redundant
> logic.

I got a comment about that issue from you, but I might have misunderstood.

(2012/03/09 1:18), Tom Lane wrote:
> I've been looking at this patch a little bit over the past day or so.
> I'm pretty unhappy with deparse.c --- it seems like a real kluge,
> inefficient and full of corner-case bugs. After some thought I believe
> that you're ultimately going to have to abandon depending on ruleutils.c
> for reverse-listing services, and it would be best to bite that bullet
> now and rewrite this code from scratch.

I thought that writing ruleutils-free SQL constructor in postgres_fdw
is better approach because ruleutils might be changed from its own
purpose in future. Besides that, as Kaigai-san mentioned in upthread,
ruleutils seems to have insufficient capability for building remote
PostgreSQL query.

--
Shigeru HANADA

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Albe Laurenz 2013-02-14 09:45:04 Re: FDW for PostgreSQL
Previous Message Heikki Linnakangas 2013-02-14 08:25:07 Re: [COMMITTERS] pgsql: Create libpgcommon, and move pg_malloc et al to it