review: FDW API

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Postgres - Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
Subject: review: FDW API
Date: 2011-01-16 00:55:19
Message-ID: 4D3241F7.7070007@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

what follows is a review of the FDW API patch from
http://archives.postgresql.org/message-id/20110114212358.82C7.6989961C@metrosystems.co.jp

All three patches apply cleanly and compile without warnings. Regression
tests pass.

Let me go patch by patch, starting with the first one that adds the
HANDLER option.

It adds one useless hunk in src/backend/commands/foreigncmds.c (changes
order if #includes).

There's a typo in a C commnent ("determin which validator to be used").

Other than that, it looks OK.

The third patch just adds a GetForeignTable helper function and it looks OK.

The second patch actually adds the API. First of all, I'd like say that
it's a really cool piece of code, allowing all kinds of awesome
funcionality to be added. I'm already excited by the things that this
will make possible. Congratulations!

To get a feel of the API I wrote a simple FDW wrapper that presents data
from the commitfest RSS feed, based heavily on twitter_fdw by Hitoshi
Harada. Please treat my thoughts as someone's who doesn't really know
*why* the API looks like it does, but has some observations about what
was missing or what felt strange when using it. I guess that's the
position a typical FDW wrapper writer will be in.

First of all, the C comments mention that PlanRelScan should put a tuple
descriptor in FdwPlan, but there's no such field in it. Also, comments
for PlanRelScan talk about the 'attnos' argument, which is not in the
function's signature. I guess the comments are just obsolete and should
be updated. I think it's actually a good thing you don't have to put a
TupleDesc in FdwPlan.

There are two API methods, PlanNative and PlanQuery that are ifdef'd out
using IN_THE_FUTURE. Isn't the standard symbol we use NOT_USED? Also,
the comments say you can implement either PlanRelScan or PlanNative, and
only the former is available for now. If we keep these methods
commented, they should be moved to the end of the struct, so that
uncommenting them will not break compatibility with existing FDWs.

The only documentation a FDW writer has is fdwapi.h, so comments there
need to be top-notch. We might contemplate writing a documentation
chapter explaining how FDW handlers should be written, like we do for C
SRFs and libpq programs, but I guess for now just the headers file would
be enough.

FdwExecutionState is just a struct around a void pointer, can we imagine
adding more fields there? If not, maybe we could just remove the
structure and pass void pointers around? OTOH that gives us some
compiler checking and possibility of extending the struct, so I guess we
could also just leave it like that.

The Iterate method gets passed a TupleTableSlot. Do we really need such
a low-level structure? It makes returning the result easy, because you
just form your tuple and call ExecStoreTuple, but perhaps we could
abstract that away a bit and add a helper method that will take a tuple
and call ExecStoreTuple for us, passing InvalidBuffer and false as the
remaining arguments. Or maybe make Iterate return the tuple and call
ExecStoreTuple internally? I don't have strong opinions, but
TupleTableSlot feels a bit too gutty - I'm having a hard time imagining
what fields from it would be useful for a FDW writer, and so perhaps you
don't need to expose it.

Why does BeginScan accept a ParamListInfo argument? First of all, it
feels like a parsing thing, not a relation scan thing, so perhaps it
should be available at some other, earlier stage. Second of all, what
would it be useful for anyway? Neither file_fdw nor my commitfest_fdw
does anything with it.

We could use comments about how to return tuples from Iterate and how to
finish returning them. I had to look at the example to figure out that
you need ExecClearTuple(slot) in your last Iterate. If Iterate's
interface were to change, we could just return NULL instead of a tuple
to say that we're done.

We could be a bit more explicit about how to allocate objects, for
instance if I'm allocating a FdwPlan in my PlanRelScan with a palloc,
will it not go away too soon, or too late (IOW leak)?

I ran into a problem when doing:

select i from generate_series(1, 100000) as g(i), pgcommitfest;

when I was trying to check for leaks. It returned four rows
(pgcommitfest is the foreign table that returns four rows). Explain
analyze shows a nested loop with a foreign scan as inner loop. Maybe
it's because I didn't implement ReScan, but the API says I don't have to.

If you don't implement Iterate you get a elog(ERROR). But if you don't
implement one of the other required methods, you segfault. Feels
inconsistent.

PlanRelScan looks like something that could use all kinds of information
to come up with a good plan. Maybe we could change its input argument to
a single struct that would contain all the current arguments, so it'll
be easier to extend when people will start writing FDWs and will find
out that they'd like more information available. Doing that would mean
that adding a field in a release would mean FDWs just need to be
recompiled and they keep working. Or do we opt for the "better an
explicitly compile error than a silent change" approach? If we'd be just
passing additional info to PlanRelScan I'd say keeping old source
compilable would not be a problem.

Storing private info in FdwPlan's private list is really awkward. I know
it's because you need copyObject support, but it's just a big pain if
you want to store several different things. Passing them around as a big
list and doing list_nth(private, MY_WIDGET_OFFSET) feels like writing
Lisp before you learn it has structures :) Is there really no other way?

Maybe PlanNative should get the foreign table OID, not the server OID,
to resemble PlanRelScan more. The server OID is just a syscache lookup
away, anyway.

If you do EXPLAIN SELECT * FROM foregintable, BeginScan is not called,
but EndScan is. That's weird, and I noticed because I got a segfault
when EndScan tried to free things that BeginScan allocated. Maybe just
don't call EndScan for EXPLAIN?

That's all as far as the API goes. Feel free to ignore most of these
remarks if you see a reason why your choices are better (or necessary).
I just thought I'd try to take a look at it as a user would (which is
what I am, as I don't fully understand the internals) and offer my
impressions.

In general, the feature looks great and I hope it'll make it into 9.1.
And it we'd get the possibility to write FDW handlers in other PLs than
C, it would rock so hard...

I'm going to mark this a Waiting for Author because of the typos, the
BeginScan/EndScan issue, and the nested loop stopping halfway issue. The
rest are suggestions or just thoughts, and if you don't see them as
justified, I'll mark the next patch Ready for Committer.

Cheers,
Jan

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2011-01-16 01:26:24 Re: psql: Add \dL to show languages
Previous Message Charles Pritchard 2011-01-16 00:45:19 W3C Specs: Web SQL Revisit