Re: review: FDW API

Lists: pgsql-hackers
From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Subject: review: FDW API
Date: 2011-01-16 00:55:19
Message-ID: 4D3241F7.7070007@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

what follows is a review of the FDW API patch from
http://archives.postgresql.org/message-id/20110114212358.82C7.6989961C@metrosystems.co.jp

All three patches apply cleanly and compile without warnings. Regression
tests pass.

Let me go patch by patch, starting with the first one that adds the
HANDLER option.

It adds one useless hunk in src/backend/commands/foreigncmds.c (changes
order if #includes).

There's a typo in a C commnent ("determin which validator to be used").

Other than that, it looks OK.

The third patch just adds a GetForeignTable helper function and it looks OK.

The second patch actually adds the API. First of all, I'd like say that
it's a really cool piece of code, allowing all kinds of awesome
funcionality to be added. I'm already excited by the things that this
will make possible. Congratulations!

To get a feel of the API I wrote a simple FDW wrapper that presents data
from the commitfest RSS feed, based heavily on twitter_fdw by Hitoshi
Harada. Please treat my thoughts as someone's who doesn't really know
*why* the API looks like it does, but has some observations about what
was missing or what felt strange when using it. I guess that's the
position a typical FDW wrapper writer will be in.

First of all, the C comments mention that PlanRelScan should put a tuple
descriptor in FdwPlan, but there's no such field in it. Also, comments
for PlanRelScan talk about the 'attnos' argument, which is not in the
function's signature. I guess the comments are just obsolete and should
be updated. I think it's actually a good thing you don't have to put a
TupleDesc in FdwPlan.

There are two API methods, PlanNative and PlanQuery that are ifdef'd out
using IN_THE_FUTURE. Isn't the standard symbol we use NOT_USED? Also,
the comments say you can implement either PlanRelScan or PlanNative, and
only the former is available for now. If we keep these methods
commented, they should be moved to the end of the struct, so that
uncommenting them will not break compatibility with existing FDWs.

The only documentation a FDW writer has is fdwapi.h, so comments there
need to be top-notch. We might contemplate writing a documentation
chapter explaining how FDW handlers should be written, like we do for C
SRFs and libpq programs, but I guess for now just the headers file would
be enough.

FdwExecutionState is just a struct around a void pointer, can we imagine
adding more fields there? If not, maybe we could just remove the
structure and pass void pointers around? OTOH that gives us some
compiler checking and possibility of extending the struct, so I guess we
could also just leave it like that.

The Iterate method gets passed a TupleTableSlot. Do we really need such
a low-level structure? It makes returning the result easy, because you
just form your tuple and call ExecStoreTuple, but perhaps we could
abstract that away a bit and add a helper method that will take a tuple
and call ExecStoreTuple for us, passing InvalidBuffer and false as the
remaining arguments. Or maybe make Iterate return the tuple and call
ExecStoreTuple internally? I don't have strong opinions, but
TupleTableSlot feels a bit too gutty - I'm having a hard time imagining
what fields from it would be useful for a FDW writer, and so perhaps you
don't need to expose it.

Why does BeginScan accept a ParamListInfo argument? First of all, it
feels like a parsing thing, not a relation scan thing, so perhaps it
should be available at some other, earlier stage. Second of all, what
would it be useful for anyway? Neither file_fdw nor my commitfest_fdw
does anything with it.

We could use comments about how to return tuples from Iterate and how to
finish returning them. I had to look at the example to figure out that
you need ExecClearTuple(slot) in your last Iterate. If Iterate's
interface were to change, we could just return NULL instead of a tuple
to say that we're done.

We could be a bit more explicit about how to allocate objects, for
instance if I'm allocating a FdwPlan in my PlanRelScan with a palloc,
will it not go away too soon, or too late (IOW leak)?

I ran into a problem when doing:

select i from generate_series(1, 100000) as g(i), pgcommitfest;

when I was trying to check for leaks. It returned four rows
(pgcommitfest is the foreign table that returns four rows). Explain
analyze shows a nested loop with a foreign scan as inner loop. Maybe
it's because I didn't implement ReScan, but the API says I don't have to.

If you don't implement Iterate you get a elog(ERROR). But if you don't
implement one of the other required methods, you segfault. Feels
inconsistent.

PlanRelScan looks like something that could use all kinds of information
to come up with a good plan. Maybe we could change its input argument to
a single struct that would contain all the current arguments, so it'll
be easier to extend when people will start writing FDWs and will find
out that they'd like more information available. Doing that would mean
that adding a field in a release would mean FDWs just need to be
recompiled and they keep working. Or do we opt for the "better an
explicitly compile error than a silent change" approach? If we'd be just
passing additional info to PlanRelScan I'd say keeping old source
compilable would not be a problem.

Storing private info in FdwPlan's private list is really awkward. I know
it's because you need copyObject support, but it's just a big pain if
you want to store several different things. Passing them around as a big
list and doing list_nth(private, MY_WIDGET_OFFSET) feels like writing
Lisp before you learn it has structures :) Is there really no other way?

Maybe PlanNative should get the foreign table OID, not the server OID,
to resemble PlanRelScan more. The server OID is just a syscache lookup
away, anyway.

If you do EXPLAIN SELECT * FROM foregintable, BeginScan is not called,
but EndScan is. That's weird, and I noticed because I got a segfault
when EndScan tried to free things that BeginScan allocated. Maybe just
don't call EndScan for EXPLAIN?

That's all as far as the API goes. Feel free to ignore most of these
remarks if you see a reason why your choices are better (or necessary).
I just thought I'd try to take a look at it as a user would (which is
what I am, as I don't fully understand the internals) and offer my
impressions.

In general, the feature looks great and I hope it'll make it into 9.1.
And it we'd get the possibility to write FDW handlers in other PLs than
C, it would rock so hard...

I'm going to mark this a Waiting for Author because of the typos, the
BeginScan/EndScan issue, and the nested loop stopping halfway issue. The
rest are suggestions or just thoughts, and if you don't see them as
justified, I'll mark the next patch Ready for Committer.

Cheers,
Jan


From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-17 13:13:19
Message-ID: 20110117221318.82F2.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 16 Jan 2011 01:55:19 +0100
Jan Urbański <wulczer(at)wulczer(dot)org> wrote:
> what follows is a review of the FDW API patch from
> http://archives.postgresql.org/message-id/20110114212358.82C7.6989961C@metrosystems.co.jp

Thanks for the comments!
For now, I answer to the first half of your comments. I'll answer to
the rest soon.

> All three patches apply cleanly and compile without warnings. Regression
> tests pass.
>
> Let me go patch by patch, starting with the first one that adds the
> HANDLER option.

Sure.

> It adds one useless hunk in src/backend/commands/foreigncmds.c (changes
> order if #includes).
>
> There's a typo in a C commnent ("determin which validator to be used").
>
> Other than that, it looks OK.

Fixed in attached patch.

> The third patch just adds a GetForeignTable helper function and it looks OK.

Thanks. This patch might be able to committed separately because it
would be small enough, and similar to existing lookup functions such
as GetForeignDataWrapper() and GetForeignServer().

> The second patch actually adds the API. First of all, I'd like say that
> it's a really cool piece of code, allowing all kinds of awesome
> funcionality to be added. I'm already excited by the things that this
> will make possible. Congratulations!
>
> To get a feel of the API I wrote a simple FDW wrapper that presents data
> from the commitfest RSS feed, based heavily on twitter_fdw by Hitoshi
> Harada. Please treat my thoughts as someone's who doesn't really know
> *why* the API looks like it does, but has some observations about what
> was missing or what felt strange when using it. I guess that's the
> position a typical FDW wrapper writer will be in.

Sure, I think your point of view is very important.

> First of all, the C comments mention that PlanRelScan should put a tuple
> descriptor in FdwPlan, but there's no such field in it. Also, comments
> for PlanRelScan talk about the 'attnos' argument, which is not in the
> function's signature. I guess the comments are just obsolete and should
> be updated. I think it's actually a good thing you don't have to put a
> TupleDesc in FdwPlan.

Removed comments about 'attnos' and tuple descriptor.

> There are two API methods, PlanNative and PlanQuery that are ifdef'd out
> using IN_THE_FUTURE. Isn't the standard symbol we use NOT_USED? Also,
> the comments say you can implement either PlanRelScan or PlanNative, and
> only the former is available for now. If we keep these methods
> commented, they should be moved to the end of the struct, so that
> uncommenting them will not break compatibility with existing FDWs.

Agreed. Moved ifdef'd part to the end of struct.

> The only documentation a FDW writer has is fdwapi.h, so comments there
> need to be top-notch. We might contemplate writing a documentation
> chapter explaining how FDW handlers should be written, like we do for C
> SRFs and libpq programs, but I guess for now just the headers file would
> be enough.

file_fdw and postgresql_fdw would be good samples for wrapper
developer if we could ship them as contrib modules.

> FdwExecutionState is just a struct around a void pointer, can we imagine
> adding more fields there? If not, maybe we could just remove the
> structure and pass void pointers around? OTOH that gives us some
> compiler checking and possibility of extending the struct, so I guess we
> could also just leave it like that.

ISTM that using a struct as a interface is better than void*, as you
mentioned.

> The Iterate method gets passed a TupleTableSlot. Do we really need such
> a low-level structure? It makes returning the result easy, because you
> just form your tuple and call ExecStoreTuple, but perhaps we could
> abstract that away a bit and add a helper method that will take a tuple
> and call ExecStoreTuple for us, passing InvalidBuffer and false as the
> remaining arguments. Or maybe make Iterate return the tuple and call
> ExecStoreTuple internally? I don't have strong opinions, but
> TupleTableSlot feels a bit too gutty - I'm having a hard time imagining
> what fields from it would be useful for a FDW writer, and so perhaps you
> don't need to expose it.

This would be debatable issue. Currently Iterate() is expected to
return materialized HeapTuple through TupleTableSlot.

I think an advantage to use TupleTableSlot is that FDW can set shoudFree
flag for the tuple.

> Why does BeginScan accept a ParamListInfo argument? First of all, it
> feels like a parsing thing, not a relation scan thing, so perhaps it
> should be available at some other, earlier stage. Second of all, what
> would it be useful for anyway? Neither file_fdw nor my commitfest_fdw
> does anything with it.

ParamListInfo is added to pass parameters of PREPARE/EXECUTE statement
to FDWs.

Plan for a prepared query is generated at PREPARE with placeholders,
and executed at EXECUTE with actual values. With ParamListInfo
parameter for BeginScan(), FDWs would be able to optimize their remote
query with actual parameter values.

Regard,
--
Shigeru Hanada


From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-17 13:17:22
Message-ID: 20110117221721.82F9.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 17 Jan 2011 22:13:19 +0900
Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp> wrote:
> Fixed in attached patch.

Sorry, I have not attached patch to last message.
I'll post revised patch in next message soon.

Regards,
--
Shigeru Hanada


From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-18 15:26:17
Message-ID: 20110119002615.8316.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 16 Jan 2011 01:55:19 +0100
Jan Urbański <wulczer(at)wulczer(dot)org> wrote:
<snip>
> In general, the feature looks great and I hope it'll make it into 9.1.
> And it we'd get the possibility to write FDW handlers in other PLs than
> C, it would rock so hard...
>
> I'm going to mark this a Waiting for Author because of the typos, the
> BeginScan/EndScan issue, and the nested loop stopping halfway issue. The
> rest are suggestions or just thoughts, and if you don't see them as
> justified, I'll mark the next patch Ready for Committer.

Thanks a lot for the comments. I've (hopefully) fixed issues above.
Please find attached patches.

== patch list ==

1) 20110118-no_fdw_perm_check.patch - This patch is not included in
last post. This had been proposed on 2011-01-05 first, but maybe has
not been reviewd yet. I re-propose this patch for SQL standard
conformance. This patch removes permission check that requires USAGE
on the foreign-data wrapper at CREATE FOREIGN TABLE.
Please see original post for details.
http://archives.postgresql.org/message-id/20110105145206.30FD.6989961C@metrosystems.co.jp

2) 20110118-fdw_catalog_lookup.patch - This patch adds GetForeignTables.
Fixed lack of pg_foreign_table.h inclusion.

3) 20110118-fdw_handler.patch - This patch adds support for HANDLER
option to FOREIGN DATA WRAPPER object.

4) 20110118-foreign_scan.patch - This patch adds ForeignScan executor
node and FDW API hooks based on Heikki's proposal. As Itagaki-san
suggested on 2010-12-21, FDW must generate information for EXPLAIN
VERBOSE every PlanRelScan() call. It's better to avoid such overhead,
so new EXPLAIN hook would be needed. I'll try to make it cleaner.

================

And I'll reply to the rest of your comments.

> We could use comments about how to return tuples from Iterate and how to
> finish returning them. I had to look at the example to figure out that
> you need ExecClearTuple(slot) in your last Iterate. If Iterate's
> interface were to change, we could just return NULL instead of a tuple
> to say that we're done.

I've added some comments for FDW-developer to fdwapi.h, though they
wouldn't be enough.

> We could be a bit more explicit about how to allocate objects, for
> instance if I'm allocating a FdwPlan in my PlanRelScan with a palloc,
> will it not go away too soon, or too late (IOW leak)?

For that example, the answer is no. Objects are allocated in
MessageContext if you don't switch context and released when the query
has been finished. I agree that more documentation or comments for
FDW-developers should be added.

> Maybe PlanNative should get the foreign table OID, not the server OID,
> to resemble PlanRelScan more. The server OID is just a syscache lookup
> away, anyway.

You would missed a case that multiple foreign tables are used in a
query. Main purpose of PlanNative is to support pass-through
execution of remote query. In pass-through mode, you can use syntax
as if you have directly connected to external server, so can't use
PostgreSQL's parser.

> If you do EXPLAIN SELECT * FROM foregintable, BeginScan is not called,
> but EndScan is. That's weird, and I noticed because I got a segfault
> when EndScan tried to free things that BeginScan allocated. Maybe just
> don't call EndScan for EXPLAIN?

Fixed to not call EndScan if it was EXPLAIN execution.

Regards,
--
Shigeru Hanada

Attachment Content-Type Size
20110118-no_fdw_perm_check.patch.gz application/octet-stream 360 bytes
20110118-fdw_catalog_lookup.patch.gz application/octet-stream 1.0 KB
20110118-fdw_handler.patch.gz application/octet-stream 8.3 KB
20110118-foreign_scan.patch.gz application/octet-stream 12.0 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-21 15:17:20
Message-ID: 4D39A380.4030003@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18.01.2011 17:26, Shigeru HANADA wrote:
> 1) 20110118-no_fdw_perm_check.patch - This patch is not included in
> last post. This had been proposed on 2011-01-05 first, but maybe has
> not been reviewd yet. I re-propose this patch for SQL standard
> conformance. This patch removes permission check that requires USAGE
> on the foreign-data wrapper at CREATE FOREIGN TABLE.
> Please see original post for details.
> http://archives.postgresql.org/message-id/20110105145206.30FD.6989961C@metrosystems.co.jp

Committed this part.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-21 15:57:23
Message-ID: AANLkTikzdE1peCYrNC53saXodPq=sKawFGrcUjPVezYD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 21, 2011 at 10:17 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 18.01.2011 17:26, Shigeru HANADA wrote:
>>
>> 1) 20110118-no_fdw_perm_check.patch - This patch is not included in
>> last post.  This had been proposed on 2011-01-05 first, but maybe has
>> not been reviewd yet.  I re-propose this patch for SQL standard
>> conformance.  This patch removes permission check that requires USAGE
>> on the foreign-data wrapper at CREATE FOREIGN TABLE.
>> Please see original post for details.
>>
>> http://archives.postgresql.org/message-id/20110105145206.30FD.6989961C@metrosystems.co.jp
>
> Committed this part.

How much review have you done of parts (3) and (4)? The key issue for
all of the FDW work in progress seems to be what the handler API is
going to look like, and so once we get that committed it will unblock
a lot of other things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-21 16:05:44
Message-ID: 4D39AED8.20606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21.01.2011 17:57, Robert Haas wrote:
> How much review have you done of parts (3) and (4)?

Not much. I'm getting there..

> The key issue for
> all of the FDW work in progress seems to be what the handler API is
> going to look like, and so once we get that committed it will unblock
> a lot of other things.

Yep. The API that's there now was originally suggested by me, so I
probably won't have big complaints about it. I'll have to also look at
the PostgreSQL and file implementations of it to see that it really fits
the bill.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-21 16:28:19
Message-ID: 4D39B423.1000203@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18.01.2011 17:26, Shigeru HANADA wrote:
> 3) 20110118-fdw_handler.patch - This patch adds support for HANDLER
> option to FOREIGN DATA WRAPPER object.

Some quick comments on that:

* I wonder if CREATE FOREIGN DATA WRAPPER should automatically create
the handler function, if it doesn't exist yet. That's what CREATE
LANGUAGE does, which is similar. Although it doesn't seem to be
documented for CREATE LANGUAGE either, is it deprecated?

* The elogs in parse_func_options() should be ereports.

* pg_dump should check the version number and only try to select
fdwhandler column if >= 9.1. See the other functions there for example
of that.

* dumpForeignDataWrapper() in pg_dump checks if fdwhandler field is "-".
I don't think we use that as magic value there, do we? Same with validator.

* Please check that the HANDLER and VALIDATOR options that pg_dump
creates properly quoted.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-21 22:20:20
Message-ID: 808.1295648420@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Some quick comments on that:

> * I wonder if CREATE FOREIGN DATA WRAPPER should automatically create
> the handler function, if it doesn't exist yet. That's what CREATE
> LANGUAGE does, which is similar. Although it doesn't seem to be
> documented for CREATE LANGUAGE either, is it deprecated?

Doing that would require the equivalent of pg_pltemplate for FDWs, no?
I think we're a long way from wanting to do that. Also, it seems to me
that add-on FDWs are likely to end up getting packaged as extensions,
so the extension machinery will probably render the question moot pretty
soon.

regards, tom lane


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-24 08:29:33
Message-ID: AANLkTi=CYoGDTfXhM+zYXyspmLry5itTZsQKE94uUJrU@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 22, 2011 at 07:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> * I wonder if CREATE FOREIGN DATA WRAPPER should automatically create
>> the handler function, if it doesn't exist yet.
>
> Doing that would require the equivalent of pg_pltemplate for FDWs, no?
> I think we're a long way from wanting to do that.  Also, it seems to me
> that add-on FDWs are likely to end up getting packaged as extensions,

The proposed file_fdw.sql actually creates a default FDW on installation.
So I think the installation scripts work as a template even if we don't
have FDW template catalogs.

+ /* contrib/file_fdw/file_fdw.sql.in */
+ -- create wrapper with validator and handler
+ CREATE OR REPLACE FUNCTION file_fdw_validator (text[], oid)
+ CREATE OR REPLACE FUNCTION file_fdw_handler ()
+ CREATE FOREIGN DATA WRAPPER file_fdw
+ VALIDATOR file_fdw_validator HANDLER file_fdw_handler;

--
Itagaki Takahiro


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-24 13:08:11
Message-ID: 4D3D79BB.6070603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 21.01.2011 17:57, Robert Haas wrote:
> On Fri, Jan 21, 2011 at 10:17 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 18.01.2011 17:26, Shigeru HANADA wrote:
>>>
>>> 1) 20110118-no_fdw_perm_check.patch - This patch is not included in
>>> last post. This had been proposed on 2011-01-05 first, but maybe has
>>> not been reviewd yet. I re-propose this patch for SQL standard
>>> conformance. This patch removes permission check that requires USAGE
>>> on the foreign-data wrapper at CREATE FOREIGN TABLE.
>>> Please see original post for details.
>>>
>>> http://archives.postgresql.org/message-id/20110105145206.30FD.6989961C@metrosystems.co.jp
>>
>> Committed this part.
>
> How much review have you done of parts (3) and (4)? The key issue for
> all of the FDW work in progress seems to be what the handler API is
> going to look like, and so once we get that committed it will unblock
> a lot of other things.

I've gone through the code in a bit more detail now. I did a bunch of
cosmetic changes along the way, patch attached. I also added a few
paragraphs in the docs. We need more extensive documentation, but this
at least marks the places where I think the docs need to go.

Comments:

* How can a FDW fetch only the columns required by the scan? The file
FDW has to read the whole file anyhow, but it could perhaps skip calling
the input function for unnecessary columns. But more importantly, with
something like the postgresql_fdw you don't want to fetch any extra
columns across the wire. I gather the way to do it is to copy
RelOptInfo->attr_needed to private storage at plan stage, and fill the
not-needed attributes with NULLs in Iterate. That gets a bit awkward,
you need to transform attr_needed to something that can be copied for
starters. Or is that information somehow available at execution phase
otherwise?

* I think we need something in RelOptInfo to mark foreign tables. At the
moment, you need to call IsForeignTable() which does a catalog lookup.
Maybe a new RTEKind, or a boolean flag.

* Can/should we make ReScan optional? Could the executor just do
EndScan+BeginScan if there's no ReScan function?

* Is there any point in allowing a FDW without a handler? It's totally
useless, isn't it? We had the CREATE FOREIGN DATA WRAPPER syntax in
previous versions, and it allowed it, but it has always been totally
useless so I don't think we need to worry much about
backwards-compatibility here.

* Is there any use case for changing the handler or validator function
of an existign FDW with ALTER? To me it just seems like an unnecessary
complication.

* IMHO the "FDW-info" should always be displayed, without VERBOSE. In my
experience with another DBMS that had this feature, the SQL being sent
to the remote server was almost always the key piece of information that
I was looking for in the query plans.

* this check in expand_inherited_rtentry seems misplaced:

> /*
> * SELECT FOR UPDATE/SHARE is not allowed on foreign tables because
> * they are read-only.
> */
> if (newrelation->rd_rel->relkind == RELKIND_FOREIGN_TABLE &&
> lockmode != AccessShareLock)
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("SELECT FOR UPDATE/SHARE is not allowed with foreign tables")));

I don't understand why we'd need to do that for inherited tables in
particular. And it's not working for regular non-inherited foreign tables:

postgres=# SELECT * FROM filetbl2 FOR UPDATE;
ERROR: could not open file "base/11933/16397": No such file or directory

* Need to document how the FDW interacts with transaction
commit/rollback. In particular, I believe EndScan is never called if the
transaction is aborted. That needs to be noted explicitly, and need to
suggest how to clean up any external resources in that case. For
example, postgresql_fdw will need to close any open cursors or result sets.

In general, I think the design is sound. What we need is more
documentation. It'd also be nice to see the postgresql_fdw brought back
to shape so that it works against this latest version of the api patch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
small-fdw-changes.patch text/x-diff 20.4 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-24 13:29:53
Message-ID: AANLkTik3E6p81e9FkYvXvP8=FBOX8CURZuc4w+mVPRbz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 8:08 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> * Is there any point in allowing a FDW without a handler? It's totally
> useless, isn't it? We had the CREATE FOREIGN DATA WRAPPER syntax in previous
> versions, and it allowed it, but it has always been totally useless so I
> don't think we need to worry much about backwards-compatibility here.

Aren't things like dblink using this in its existing form?

> * Is there any use case for changing the handler or validator function of an
> existign FDW with ALTER? To me it just seems like an unnecessary
> complication.

+1.

> * IMHO the "FDW-info" should always be displayed, without VERBOSE. In my
> experience with another DBMS that had this feature, the SQL being sent to
> the remote server was almost always the key piece of information that I was
> looking for in the query plans.

+1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-30 19:42:02
Message-ID: AANLkTin0mPgFMK+t_Tq8J0X69-He0VPkyTPj4AQ6gC7F@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 24, 2011 at 8:08 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> How much review have you done of parts (3) and (4)?  The key issue for
>> all of the FDW work in progress seems to be what the handler API is
>> going to look like, and so once we get that committed it will unblock
>> a lot of other things.
>
> I've gone through the code in a bit more detail now. I did a bunch of
> cosmetic changes along the way, patch attached. I also added a few
> paragraphs in the docs. We need more extensive documentation, but this at
> least marks the places where I think the docs need to go.
>
> Comments:

I haven't seen any responses to these comments. Time grows short to
get this committed to PostgreSQL 9.1. We need responses to these
comments and an updated patch ASAP.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-31 03:27:55
Message-ID: 20110131122754.9B20.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 21 Jan 2011 18:28:19 +0200
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 18.01.2011 17:26, Shigeru HANADA wrote:
> > 3) 20110118-fdw_handler.patch - This patch adds support for HANDLER
> > option to FOREIGN DATA WRAPPER object.
>
> Some quick comments on that:

Thanks for the comments.
I'll post revised version of patches soon.

> * The elogs in parse_func_options() should be ereports.
>
> * pg_dump should check the version number and only try to select
> fdwhandler column if >= 9.1. See the other functions there for example
> of that.

Fixed.

> * dumpForeignDataWrapper() in pg_dump checks if fdwhandler field is "-".
> I don't think we use that as magic value there, do we? Same with validator.

That magic value, "-", is used as "no-function-was-set" in
dumpForeignDataWrapper() and dumpForeignServer(), and I followed them.
Agreed that magic value should be removed, but it would be a refactoring
issue about pg_dump.

> * Please check that the HANDLER and VALIDATOR options that pg_dump
> creates properly quoted.

I checked quoting for HANDLER and VALIDATOR with file_fdw functions,
and it seems works fine. The pg_dump generats:

------------
CREATE FOREIGN DATA WRAPPER dummy_fdw VALIDATOR public."File_Fdw_Validator"
HANDLER public."FILE_FDW_HANDLER";
------------

from these DDLs:

------------
CREATE OR REPLACE FUNCTION "File_Fdw_Validator" (text[], oid)
RETURNS bool
AS '$libdir/file_fdw','file_fdw_validator'
LANGUAGE C STRICT;

CREATE OR REPLACE FUNCTION "FILE_FDW_HANDLER" ()
RETURNS fdw_handler
AS '$libdir/file_fdw','file_fdw_handler'
LANGUAGE C STRICT;

CREATE FOREIGN DATA WRAPPER dummy_fdw
VALIDATOR "File_Fdw_Validator" HANDLER "FILE_FDW_HANDLER";
------------

Regard,
--
Shigeru Hanada


From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-31 13:00:55
Message-ID: 20110131220054.9B28.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 24 Jan 2011 15:08:11 +0200
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I've gone through the code in a bit more detail now. I did a bunch of
> cosmetic changes along the way, patch attached. I also added a few
> paragraphs in the docs. We need more extensive documentation, but this
> at least marks the places where I think the docs need to go.
>
> Comments:

Thanks for the comments!

> * How can a FDW fetch only the columns required by the scan? The file
> FDW has to read the whole file anyhow, but it could perhaps skip calling
> the input function for unnecessary columns. But more importantly, with
> something like the postgresql_fdw you don't want to fetch any extra
> columns across the wire. I gather the way to do it is to copy
> RelOptInfo->attr_needed to private storage at plan stage, and fill the
> not-needed attributes with NULLs in Iterate. That gets a bit awkward,
> you need to transform attr_needed to something that can be copied for
> starters. Or is that information somehow available at execution phase
> otherwise?

I thought that RelOptInfo->reltargetlist, a list of Var, can be used
for that purpose. FdwPlan can copy it with copyObject(), and pass
it to Iterate through FdwPlan->fdw_private.

Then, postgresql_fdw would be able to retrieve only necessary columns,
or just use "NULL" for unnecessary columns in the SELECT clause to
avoid mapping values to columns. Each way would be decrease amount of
data transfer.

> * I think we need something in RelOptInfo to mark foreign tables. At the
> moment, you need to call IsForeignTable() which does a catalog lookup.
> Maybe a new RTEKind, or a boolean flag.

We can avoid catalog lookup with checking table type in get_relation_info()
and updating RelOptInfo->is_foreign_table if the target was a foreign
table.

> * Can/should we make ReScan optional? Could the executor just do
> EndScan+BeginScan if there's no ReScan function?

Right, we have enough information to call BeginScan again. Will fix.

> * Is there any point in allowing a FDW without a handler? It's totally
> useless, isn't it? We had the CREATE FOREIGN DATA WRAPPER syntax in
> previous versions, and it allowed it, but it has always been totally
> useless so I don't think we need to worry much about
> backwards-compatibility here.

dblink (and possibly other external modules) uses FDW without a
handler.

> * Is there any use case for changing the handler or validator function
> of an existign FDW with ALTER? To me it just seems like an unnecessary
> complication.

AFAICS, the only case for that is upgrading FDW to new one without
re-creating foreign tables. I don't have strong opinion for this
issue, and it seems reasonable to remove ALTER feature in first
version.

> * IMHO the "FDW-info" should always be displayed, without VERBOSE. In my
> experience with another DBMS that had this feature, the SQL being sent
> to the remote server was almost always the key piece of information that
> I was looking for in the query plans.

Agreed, will fix to show FDW-info always. Is it reasonable to show
"FDW-info" row even if a FDW set explainInfo to NULL?

> * this check in expand_inherited_rtentry seems misplaced:
>
> > /*
> > * SELECT FOR UPDATE/SHARE is not allowed on foreign tables because
> > * they are read-only.
> > */
> > if (newrelation->rd_rel->relkind == RELKIND_FOREIGN_TABLE &&
> > lockmode != AccessShareLock)
> > ereport(ERROR,
> > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> > errmsg("SELECT FOR UPDATE/SHARE is not allowed with foreign tables")));
>
> I don't understand why we'd need to do that for inherited tables in
> particular. And it's not working for regular non-inherited foreign tables:
>
> postgres=# SELECT * FROM filetbl2 FOR UPDATE;
> ERROR: could not open file "base/11933/16397": No such file or directory

It's a remnants of table inheritance support for foreign tables. This
check should be removed from here, and another check should be added
to avoid above error.

> * Need to document how the FDW interacts with transaction
> commit/rollback. In particular, I believe EndScan is never called if the
> transaction is aborted. That needs to be noted explicitly, and need to
> suggest how to clean up any external resources in that case. For
> example, postgresql_fdw will need to close any open cursors or result sets.

I agree that resource cleanup is an important issue when writing FDW.
FDW should use transaction-safe resources like VirtualFile, or use
ResourceOwner callback mechanism. Is it reasonable to add new page
under "Chapter 35. Extending SQL"?

> In general, I think the design is sound. What we need is more
> documentation. It'd also be nice to see the postgresql_fdw brought back
> to shape so that it works against this latest version of the api patch.

I'll post FDW API patches which reflect comments first, and then I'll
rebase postgresql_fdw against them.

Regards,
--
Shigeru Hanada


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-01-31 13:31:50
Message-ID: AANLkTinwTnxh5JHd69=tD3u7gLzy_C12P00BCg58PDTq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 31, 2011 at 8:00 AM, Shigeru HANADA
<hanada(at)metrosystems(dot)co(dot)jp> wrote:
>> * Is there any use case for changing the handler or validator function
>> of an existign FDW with ALTER? To me it just seems like an unnecessary
>> complication.
>
> AFAICS, the only case for that is upgrading FDW to new one without
> re-creating foreign tables.  I don't have strong opinion for this
> issue, and it seems reasonable to remove ALTER feature in first
> version.

-1. I don't think that removing the ability to change this is going
to save a measurable amount of complexity, and it certainly will suck
if you need it and don't have it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-07 07:00:37
Message-ID: 20110207160037.8721.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 31 Jan 2011 22:00:55 +0900
Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp> wrote:
> I'll post FDW API patches which reflect comments first, and then I'll
> rebase postgresql_fdw against them.

Sorry for late, attached are revised version of FDW API patches which
reflect Heikki's comments except removing catalog lookup via
IsForeignTable(). ISTM that the point is avoiding catalog lookup
during planning, but I have not found when we can set "foreign table
flag" without catalog lookup during RelOptInfo generation.

Please apply attached patches in this order.

1) fdw_catalog_lookup.patch
2) fdw_handler.patch
3) foreign_scan.patch

To execute SELECT quereis for foreign tables, you need a FDW which has
valid fdwhandler function. The file_fdw which is posted in another
thread "SQL/MED file_fdw" would help.

Changes from last patches are:

1) Now SELECT FOR UPDATE check for foreign tables are done properly in
executor phase, in ExecLockTuple(). Or such check should be done in
parser or planner?

2) Server version is checked in pg_dump (>= 90100).

3) ReScan is not required now. If ReScan is not supplied, ForeignScan
uses EndScan + BeginSacn instead.

4) FDW-Info in EXPLAIN is shown always, except FDW set NULL to
explainInfo.

Regards,
--
Shigeru Hanada

Attachment Content-Type Size
fdw_catalog_lookup.patch.gz application/octet-stream 1013 bytes
fdw_handler.patch.gz application/octet-stream 8.1 KB
foreign_scan.patch.gz application/octet-stream 11.7 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-07 08:37:37
Message-ID: 4D4FAF51.2040809@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07.02.2011 08:00, Shigeru HANADA wrote:
> Sorry for late, attached are revised version of FDW API patches which
> reflect Heikki's comments except removing catalog lookup via
> IsForeignTable(). ISTM that the point is avoiding catalog lookup
> during planning, but I have not found when we can set "foreign table
> flag" without catalog lookup during RelOptInfo generation.

In get_relation_info(), you do the catalog lookup anyway and have the
Relation object at hand. Add a flag to RelOptInfo indicating if it's a
foreign table or not, and set that in get_relation_info().

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-08 11:07:09
Message-ID: 20110208200709.7DED.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 07 Feb 2011 09:37:37 +0100
Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> On 07.02.2011 08:00, Shigeru HANADA wrote:
> > Sorry for late, attached are revised version of FDW API patches which
> > reflect Heikki's comments except removing catalog lookup via
> > IsForeignTable(). ISTM that the point is avoiding catalog lookup
> > during planning, but I have not found when we can set "foreign table
> > flag" without catalog lookup during RelOptInfo generation.
>
> In get_relation_info(), you do the catalog lookup anyway and have the
> Relation object at hand. Add a flag to RelOptInfo indicating if it's a
> foreign table or not, and set that in get_relation_info().

Thanks a lot.

Attached is a revised version of foreign_scan patch. This still
requires fdw_handler patch which was attached to the orginal post.

Avoid_catalog_lookup.patch is attached for review purpose.
This patch includes changes for this fix.

Regards,
--
Shigeru Hanada

Attachment Content-Type Size
avoid_catalog_lookup.patch application/octet-stream 4.2 KB
20110208-foreign_scan.patch.gz application/octet-stream 12.0 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-11 20:50:55
Message-ID: 4D55A12F.1050706@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08.02.2011 13:07, Shigeru HANADA wrote:
>
> On Mon, 07 Feb 2011 09:37:37 +0100
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> In get_relation_info(), you do the catalog lookup anyway and have the
>> Relation object at hand. Add a flag to RelOptInfo indicating if it's a
>> foreign table or not, and set that in get_relation_info().
>
> Thanks a lot.
>
> Attached is a revised version of foreign_scan patch. This still
> requires fdw_handler patch which was attached to the orginal post.
>
> Avoid_catalog_lookup.patch is attached for review purpose.
> This patch includes changes for this fix.

Thanks.

I spent some more time reviewing this, and working on the PostgreSQL FDW
in tandem. Here's an updated API patch, with a bunch of cosmetic
changes, and a bug fix for FOR SHARE/UPDATE. FOR SHARE/UPDATE on a
foreign table should behave the same as on other relations that can't be
locked, like a function scan. If you explicitly specify such a relation
with "FOR UPDATE OF foo", you should get an error, and if you just have
an unspecified "FOR UPDATE", it should be silently ignored for foreign
tables.

Doing that requires that we can distinguish foreign tables from other
relations in transformLockingClause(), but we don't have a RelOptInfo at
that stage yet. I just used get_rel_relkind() there (and elsewhere
instead of the IsForeignTable() function), but I've got a nagging
feeling that sooner or later we'll have to bite the bullet and add that
field to RangeTblEntry, or introduce a whole new rtekind for foreign tables.

As for the PostgreSQL FDW, I'm trying to make it do two basic tricks, to
validate the API:
1. Only fetch those columns that are actually needed by the query. This
involves examining the baserel->reltargetlist, also paying attention to
whole-row Vars.
2. Push down simple quals, like "column = 'foo'". To do that, I'm trying
to use the deparsing code from ruleutils.c.

That's pretty much what Shigeru's original postgresql_fdw patch also
did, but I've changed the implementation quite heavily to make it work
with the new API. That said, it's still a mess, but I think it validates
that the API is usable. We could offer a lot more help for FDW authors
to make those things easier, but I think this is acceptable for now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fdw-api-1.patch text/x-diff 96.2 KB
postgres_fdw-2.patch text/x-diff 68.6 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-15 18:40:47
Message-ID: 4D5AC8AF.1020905@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11.02.2011 22:50, Heikki Linnakangas wrote:
> On 08.02.2011 13:07, Shigeru HANADA wrote:
>>
>> On Mon, 07 Feb 2011 09:37:37 +0100
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> In get_relation_info(), you do the catalog lookup anyway and have the
>>> Relation object at hand. Add a flag to RelOptInfo indicating if it's a
>>> foreign table or not, and set that in get_relation_info().
>>
>> Thanks a lot.
>>
>> Attached is a revised version of foreign_scan patch. This still
>> requires fdw_handler patch which was attached to the orginal post.
>>
>> Avoid_catalog_lookup.patch is attached for review purpose.
>> This patch includes changes for this fix.
>
> Thanks.
>
> I spent some more time reviewing this, and working on the PostgreSQL FDW
> in tandem. Here's an updated API patch, with a bunch of cosmetic
> changes, and a bug fix for FOR SHARE/UPDATE.

Another version, rebased against master branch and with a bunch of small
cosmetic fixes.

I guess this is as good as this is going to get for 9.1. I'm sure we'll
have to revisit the API in the future as people write more FDWs and we
know their needs better, but there's one detail I'd like to have a
second opinion on before I commit this:

As the patch stands, we have to do get_rel_relkind() in a couple of
places in parse analysis and the planner to distinguish a foreign table
from a regular one. As the patch stands, there's nothing in
RangeTblEntry (which is what we have in transformLockingClause) or
RelOptInfo (for set_plain_rel_pathlist) to directly distinguish them.

I'm actually surprised we don't need to distinguish them in more places,
but nevertheless it feels like we should have that info available more
conveniently, and without requiring a catalog lookup like
get_rel_relkind() does. At first I thought we should add a field to
RelOptInfo, but that doesn't help transformLockingClause. Adding the
field to RangeTblEntry seems quite invasive, and it doesn't feel like
the right place to cache that kind of information anyway. Thoughts on that?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fdw-api-2.patch text/x-diff 95.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-15 19:00:06
Message-ID: AANLkTikFtonXzbVaTgVGp=w=8+EiZS_vJY8tEp1uNmge@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 15, 2011 at 1:40 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> I'm actually surprised we don't need to distinguish them in more places, but
> nevertheless it feels like we should have that info available more
> conveniently, and without requiring a catalog lookup like get_rel_relkind()
> does. At first I thought we should add a field to RelOptInfo, but that
> doesn't help transformLockingClause. Adding the field to RangeTblEntry seems
> quite invasive, and it doesn't feel like the right place to cache that kind
> of information anyway. Thoughts on that?

Maybe it should be a new RTEKind.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-15 19:13:34
Message-ID: 28279.1297797214@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> As the patch stands, we have to do get_rel_relkind() in a couple of
> places in parse analysis and the planner to distinguish a foreign table
> from a regular one. As the patch stands, there's nothing in
> RangeTblEntry (which is what we have in transformLockingClause) or
> RelOptInfo (for set_plain_rel_pathlist) to directly distinguish them.

Hmm. I don't have a problem with adding relkind to the planner's
RelOptInfo, but it seems to me that if parse analysis needs to know
this, you have put functionality into parse analysis that does not
belong there.

(No, I haven't read the patch...)

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-15 20:36:58
Message-ID: 4D5AE3EA.5050209@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.2011 21:13, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> As the patch stands, we have to do get_rel_relkind() in a couple of
>> places in parse analysis and the planner to distinguish a foreign table
>> from a regular one. As the patch stands, there's nothing in
>> RangeTblEntry (which is what we have in transformLockingClause) or
>> RelOptInfo (for set_plain_rel_pathlist) to directly distinguish them.
>
> Hmm. I don't have a problem with adding relkind to the planner's
> RelOptInfo, but it seems to me that if parse analysis needs to know
> this, you have put functionality into parse analysis that does not
> belong there.

Possibly. We throw the existing errors, for example if you try to do
"FOR UPDATE OF foo" where foo is a set-returning function, in
transformLockingClause(), so it seemed like the logical place to check
for foreign tables too.

Hmm, one approach would be to go ahead and create the RowMarkClauses for
all relations in the parse analysis phase, foreign or not, and throw the
error later, in preprocess_rowmarks(). preprocess_rowmarks() doesn't
currently know if each RowMarkClause was created by "... FOR UPDATE" or
"FOR UPDATE OF foo", but to be consistent with the logic in
transformLockingClause() it would need to. For the former case, it would
need to just ignore foreign tables but for the latter it would need to
throw an error. I guess we could add an "explicit" flag to RowMarkClause
for that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-15 20:37:36
Message-ID: 4D5AE410.80301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.2011 21:00, Robert Haas wrote:
> On Tue, Feb 15, 2011 at 1:40 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> I'm actually surprised we don't need to distinguish them in more places, but
>> nevertheless it feels like we should have that info available more
>> conveniently, and without requiring a catalog lookup like get_rel_relkind()
>> does. At first I thought we should add a field to RelOptInfo, but that
>> doesn't help transformLockingClause. Adding the field to RangeTblEntry seems
>> quite invasive, and it doesn't feel like the right place to cache that kind
>> of information anyway. Thoughts on that?
>
> Maybe it should be a new RTEKind.

That would be even more invasive.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-15 21:00:03
Message-ID: 10161.1297803603@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 15.02.2011 21:13, Tom Lane wrote:
>> Hmm. I don't have a problem with adding relkind to the planner's
>> RelOptInfo, but it seems to me that if parse analysis needs to know
>> this, you have put functionality into parse analysis that does not
>> belong there.

> Possibly. We throw the existing errors, for example if you try to do
> "FOR UPDATE OF foo" where foo is a set-returning function, in
> transformLockingClause(), so it seemed like the logical place to check
> for foreign tables too.

> Hmm, one approach would be to go ahead and create the RowMarkClauses for
> all relations in the parse analysis phase, foreign or not, and throw the
> error later, in preprocess_rowmarks().

I think moving the error check downstream would be a good thing.
Consider for example the case of applying FOR UPDATE to a view. You
can't know what that entails until after the rewriter expands the view.
IIRC, at the moment we're basically duplicating the tests between parse
analysis and the planner, but it's not clear what the value of that is.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-16 11:26:34
Message-ID: 4D5BB46A.2050807@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.2011 23:00, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 15.02.2011 21:13, Tom Lane wrote:
>>> Hmm. I don't have a problem with adding relkind to the planner's
>>> RelOptInfo, but it seems to me that if parse analysis needs to know
>>> this, you have put functionality into parse analysis that does not
>>> belong there.
>
>> Possibly. We throw the existing errors, for example if you try to do
>> "FOR UPDATE OF foo" where foo is a set-returning function, in
>> transformLockingClause(), so it seemed like the logical place to check
>> for foreign tables too.
>
>> Hmm, one approach would be to go ahead and create the RowMarkClauses for
>> all relations in the parse analysis phase, foreign or not, and throw the
>> error later, in preprocess_rowmarks().
>
> I think moving the error check downstream would be a good thing.

Ok, I tried moving the error checks to preprocess_rowmarks().
Unfortunately RelOptInfos haven't been built at that stage yet, so you
still have to do the catalog lookup to get the relkind. That defeats the
purpose.

We could delay the error checking further, but preprocess_rowmarks()
would need to distinguish foreign tables anyway, so that it can mark
them with ROW_MARK_COPY instead of ROW_MARK_REFERENCE.

> IIRC, at the moment we're basically duplicating the tests between parse
> analysis and the planner, but it's not clear what the value of that is.

There's duplicate logic in parse analysis and rewriter, to be precise.
And then there's this one check in make_outerjoininfo:

> /*
> * Presently the executor cannot support FOR UPDATE/SHARE marking of
rels
> * appearing on the nullable side of an outer join. (It's somewhat
unclear
> * what that would mean, anyway: what should we mark when a result
row is
> * generated from no element of the nullable relation?) So, complain if
> * any nullable rel is FOR UPDATE/SHARE.
> *
> * You might be wondering why this test isn't made far upstream in the
> * parser. It's because the parser hasn't got enough info --- consider
> * FOR UPDATE applied to a view. Only after rewriting and flattening do
> * we know whether the view contains an outer join.
> *
> * We use the original RowMarkClause list here; the PlanRowMark list
would
> * list everything.
> */
> foreach(l, root->parse->rowMarks)
> {
> RowMarkClause *rc = (RowMarkClause *) lfirst(l);
>
> if (bms_is_member(rc->rti, right_rels) ||
> (jointype == JOIN_FULL && bms_is_member(rc->rti, left_rels)))
> ereport(ERROR,
> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> errmsg("SELECT FOR UPDATE/SHARE cannot be applied to the
nullable side of an outer join")));
> }

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-18 15:47:21
Message-ID: 22262.1298044041@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 15.02.2011 23:00, Tom Lane wrote:
>> I think moving the error check downstream would be a good thing.

> Ok, I tried moving the error checks to preprocess_rowmarks().
> Unfortunately RelOptInfos haven't been built at that stage yet, so you
> still have to do the catalog lookup to get the relkind. That defeats the
> purpose.

Mph. It seems like the right fix here is to add relkind to
RangeTblEntry: it could be filled in for free in addRangeTableEntry, for
example. The main downside of that is that relation relkinds would have
to become fixed (because there would be no practical way of updating
RTEs in stored rules), which means the "convert relation to view" hack
would have to go away. Offhand I think no one cares about that anymore,
but ...

In any case, this is looking like a performance optimization that should
be dealt with in a separate patch. I'd suggest leaving the code in the
form with the extra relkind lookups for the initial commit. It's
possible that no one would notice the extra lookups anyway --- have you
benchmarked it?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-18 19:20:32
Message-ID: AANLkTikWvCzH6fdYhrb9VEbrwDX3ZdYYfFpjs5aaLjg_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 18, 2011 at 10:47 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> On 15.02.2011 23:00, Tom Lane wrote:
>>> I think moving the error check downstream would be a good thing.
>
>> Ok, I tried moving the error checks to preprocess_rowmarks().
>> Unfortunately RelOptInfos haven't been built at that stage yet, so you
>> still have to do the catalog lookup to get the relkind. That defeats the
>> purpose.
>
> Mph.  It seems like the right fix here is to add relkind to
> RangeTblEntry: it could be filled in for free in addRangeTableEntry, for
> example.

Heikki and I came to the same conclusion yesterday while chatting
about this on IM.

> The main downside of that is that relation relkinds would have
> to become fixed (because there would be no practical way of updating
> RTEs in stored rules), which means the "convert relation to view" hack
> would have to go away.  Offhand I think no one cares about that anymore,
> but ...

That actually sounds like a possible problem, because it's possible to
create views with circular dependencies using CORV, and they won't
dump-and-reload properly without that hack. It's not a particularly
useful thing to do, of course, and I think we could reengineer pg_dump
to not need the hack even if someone does do it, but that sounds like
more work than we want to tackle right now.

> In any case, this is looking like a performance optimization that should
> be dealt with in a separate patch.  I'd suggest leaving the code in the
> form with the extra relkind lookups for the initial commit.  It's
> possible that no one would notice the extra lookups anyway --- have you
> benchmarked it?

This is a good point... although I think this results in at least one
extra syscache lookup per table per SELECT, even when no foreign
tables are involved.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-18 19:35:06
Message-ID: 13459.1298057706@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Feb 18, 2011 at 10:47 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The main downside of that is that relation relkinds would have
>> to become fixed (because there would be no practical way of updating
>> RTEs in stored rules), which means the "convert relation to view" hack
>> would have to go away. Offhand I think no one cares about that anymore,
>> but ...

> That actually sounds like a possible problem, because it's possible to
> create views with circular dependencies using CORV, and they won't
> dump-and-reload properly without that hack. It's not a particularly
> useful thing to do, of course, and I think we could reengineer pg_dump
> to not need the hack even if someone does do it, but that sounds like
> more work than we want to tackle right now.

Urgh. That's problematic, because even if we changed pg_dump (which
would not be that hard I think), we'd still have to cope with dump files
emitted by existing versions of pg_dump. The time constant before that
stops being an issue is measured in years. I'm not at all sure whether
the circular dependency case is infrequent enough that we could get away
with saying "tough luck" to people who hit the case.

[ thinks a bit ... ] But we can probably hack our way around that:
teach the rule rewriter to update relkind in any RTE it brings in from a
stored rule. We already do something similar in some other cases where
a stored parsetree node contains information that could become obsolete.

But that conclusion just makes it even clearer that fixing this
performance problem, if it even is real, should be a separate patch.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-18 20:16:07
Message-ID: 14451.1298060167@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Another version, rebased against master branch and with a bunch of small
> cosmetic fixes.

> I guess this is as good as this is going to get for 9.1.

This is *badly* in need of another cleanup pass; it's full of typos,
contradictory comments, #ifdef NOT_USED stuff, etc etc. And the
documentation is really inadequate. If you're out of energy to go
over it, I guess I should step up.

Question after first look: what is the motivation for passing
estate->es_param_list_info to BeginScan? AFAICS, even if there is a
reason for that function to need that, it isn't receiving any info that
would be sufficient to let it know what's in there. What seems more
likely to be useful is to pass in the EState pointer, as for example
being able to look at es_query_cxt seems like a good idea.

BTW, I see no particularly good reason to let the FDW omit ReScan.
If it wants to implement that as end-and-begin, it can do so internally.
It would be a lot clearer to just insist that all the function pointers
be valid, as indeed some (not all) of the comments say already.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-18 20:30:08
Message-ID: 4D5ED6D0.1080906@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18.02.2011 22:16, Tom Lane wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Another version, rebased against master branch and with a bunch of small
>> cosmetic fixes.
>
>> I guess this is as good as this is going to get for 9.1.
>
> This is *badly* in need of another cleanup pass; it's full of typos,
> contradictory comments, #ifdef NOT_USED stuff, etc etc. And the
> documentation is really inadequate. If you're out of energy to go
> over it, I guess I should step up.

If you have the energy, by all means, thanks.

> Question after first look: what is the motivation for passing
> estate->es_param_list_info to BeginScan? AFAICS, even if there is a
> reason for that function to need that, it isn't receiving any info that
> would be sufficient to let it know what's in there.

The idea is that when the query is planned, the FDW can choose to push
down a qual that contains a parameter marker, like "WHERE remotecol =
$1". At execution time, it needs the value of the parameter to send it
to the remote server. The PostgreSQL FDW does that, although I didn't
test it so it might well be broken.

> What seems more
> likely to be useful is to pass in the EState pointer, as for example
> being able to look at es_query_cxt seems like a good idea.

By "look at", you mean allocate stuff in it?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-18 20:52:14
Message-ID: 15287.1298062334@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 18.02.2011 22:16, Tom Lane wrote:
>> Question after first look: what is the motivation for passing
>> estate->es_param_list_info to BeginScan? AFAICS, even if there is a
>> reason for that function to need that, it isn't receiving any info that
>> would be sufficient to let it know what's in there.

> The idea is that when the query is planned, the FDW can choose to push
> down a qual that contains a parameter marker, like "WHERE remotecol =
> $1". At execution time, it needs the value of the parameter to send it
> to the remote server. The PostgreSQL FDW does that, although I didn't
> test it so it might well be broken.

s/might well be/is/ --- there's no guarantee that parameters are valid
at executor setup time. The place that needs to be grabbing the
parameter value for that purpose is BeginScan.

>> What seems more
>> likely to be useful is to pass in the EState pointer, as for example
>> being able to look at es_query_cxt seems like a good idea.

> By "look at", you mean allocate stuff in it?

Right. I suppose you're going to comment that CurrentMemoryContext is
probably the same thing, but in general it's not going to pay to make
this API run with blinders on. My feeling is it'd be best to pass down
all the information the executor node has got --- probably we should
just pass the ForeignScanState node itself, and leave a void * in that
for FDW-private data, and be done with it. Otherwise we're going to be
adding missed stuff back to the API every time somebody notices that
their FDW can't do X because they don't have access to the necessary
information. That definitional instability will trump any ABI stability
that might be gained from not relying on executor node types. (And it's
not like changing ScanState in a released version is an entirely safe
thing to do even today --- there are lots of loadable modules that know
about that struct.)

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-18 23:41:05
Message-ID: 28233.1298072465@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> ... My feeling is it'd be best to pass down
> all the information the executor node has got --- probably we should
> just pass the ForeignScanState node itself, and leave a void * in that
> for FDW-private data, and be done with it. Otherwise we're going to be
> adding missed stuff back to the API every time somebody notices that
> their FDW can't do X because they don't have access to the necessary
> information.

Attached is a rewritten version of fdwhandler.sgml that specifies what I
think is a more future-proof API for the callback functions. Barring
objections, I'll push ahead with editing the code to match.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 6.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-20 05:28:20
Message-ID: 4759.1298179700@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 11.02.2011 22:50, Heikki Linnakangas wrote:
>> I spent some more time reviewing this, and working on the PostgreSQL FDW
>> in tandem. Here's an updated API patch, with a bunch of cosmetic
>> changes, and a bug fix for FOR SHARE/UPDATE.

> Another version, rebased against master branch and with a bunch of small
> cosmetic fixes.

I've applied this after a moderate amount of editorialization.

The question of avoiding extra relkind lookups is still open. We talked
about adding relkind to RangeTblEntry, but I wonder whether adding a new
RTEKind would be a better idea. Haven't researched it yet.

I have a hacked-up version of contrib/file_fdw that I've been using to
test it with. That needs some more cleanup before committing, but I
think it should not take too long. The one thing that is kind of
annoying is that the regression tests need generated files (to insert
absolute paths) and it seems like the PGXS infrastructure doesn't know
how to clean up the generated files afterwards. Anybody have any
thoughts about fixing that?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-20 22:38:13
Message-ID: 12581.1298241493@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Feb 18, 2011 at 10:47 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The main downside of that is that relation relkinds would have
>>> to become fixed (because there would be no practical way of updating
>>> RTEs in stored rules), which means the "convert relation to view" hack
>>> would have to go away.

>> That actually sounds like a possible problem, because it's possible to
>> create views with circular dependencies using CORV, and they won't
>> dump-and-reload properly without that hack.

> Urgh. That's problematic, because even if we changed pg_dump (which
> would not be that hard I think), we'd still have to cope with dump files
> emitted by existing versions of pg_dump.

> [ thinks a bit ... ] But we can probably hack our way around that:
> teach the rule rewriter to update relkind in any RTE it brings in from a
> stored rule. We already do something similar in some other cases where
> a stored parsetree node contains information that could become obsolete.

I did a bit of poking around here, and came to the following
conclusions:

1. We don't want to add another RTEKind. RTE_RELATION basically has the
semantics of "anything with a pg_class OID", so it ought to include
foreign tables. Therefore the fix ought to be to add relkind to
RangeTblEntry. (BTW, so far as I can tell, RTE_SPECIAL is obsolete:
there are assorted switch cases that handle it, but no place that can
generate the value. I'm inclined to delete it while we are messing
with this.)

2. In the current code layout, to make sense of relkind you need to
#include catalog/pg_class.h where the values for relkind are #defined.
I dislike the idea of that being true for a field of such a widely-known
struct as RangeTblEntry. Accordingly, I suggest that we move those
values into parsenodes.h. (Perhaps we could convert them to an enum,
too, though still keeping the same ASCII values.)

3. We can have the rewriter update an RTE's stored value of relkind
during AcquireRewriteLocks: it opens the rel for each RTE_RELATION entry
anyway, so copying over the relkind is essentially free. While it's not
instantly obvious that that is "soon enough", I think that it is, since
up to the point of acquiring a lock there we can't assume that the rel
isn't being changed or dropped undeneath us --- that is, any earlier
test on an RTE's relkind might be testing just-obsoleted state anyway.

4. I had hoped that we might be able to get rid of some pre-existing
syscache lookups, but at least so far as the parse/plan/execute chain
is concerned, there don't seem to be any other places that need to
fetch a relkind based on just an RTE entry.

So point #4 is a bit discouraging, but other that, this seems like
a fairly straightforward exercise. I'm inclined to go ahead and do it,
unless there are objections.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>, Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: review: FDW API
Date: 2011-02-23 00:28:05
Message-ID: 8513.1298420885@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I did a bit of poking around here, and came to the following
> conclusions:

> 1. We don't want to add another RTEKind. RTE_RELATION basically has the
> semantics of "anything with a pg_class OID", so it ought to include
> foreign tables. Therefore the fix ought to be to add relkind to
> RangeTblEntry. (BTW, so far as I can tell, RTE_SPECIAL is obsolete:
> there are assorted switch cases that handle it, but no place that can
> generate the value. I'm inclined to delete it while we are messing
> with this.)

> 2. In the current code layout, to make sense of relkind you need to
> #include catalog/pg_class.h where the values for relkind are #defined.
> I dislike the idea of that being true for a field of such a widely-known
> struct as RangeTblEntry. Accordingly, I suggest that we move those
> values into parsenodes.h. (Perhaps we could convert them to an enum,
> too, though still keeping the same ASCII values.)

> 3. We can have the rewriter update an RTE's stored value of relkind
> during AcquireRewriteLocks: it opens the rel for each RTE_RELATION entry
> anyway, so copying over the relkind is essentially free. While it's not
> instantly obvious that that is "soon enough", I think that it is, since
> up to the point of acquiring a lock there we can't assume that the rel
> isn't being changed or dropped undeneath us --- that is, any earlier
> test on an RTE's relkind might be testing just-obsoleted state anyway.

> 4. I had hoped that we might be able to get rid of some pre-existing
> syscache lookups, but at least so far as the parse/plan/execute chain
> is concerned, there don't seem to be any other places that need to
> fetch a relkind based on just an RTE entry.

> So point #4 is a bit discouraging, but other that, this seems like
> a fairly straightforward exercise. I'm inclined to go ahead and do it,
> unless there are objections.

Applied, except I ended up not moving the RELKIND #defines as suggested
in point #2. Those #defines are used by pg_dump, and having pg_dump.c
#include parsenodes.h seemed like a bad idea.

regards, tom lane