Re: [v9.3] writable foreign tables

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>, PgHacker <pgsql-hackers(at)postgresql(dot)org>, Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Subject: Re: [v9.3] writable foreign tables
Date: 2012-09-23 06:25:44
Message-ID: CADyhKSW9Y4tOPo7SSsKWUuHnVezrGOKDJXH5XQcNm1w+EFTqug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2012/8/29 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
> 2012/8/28 Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>:
>> 2012/8/28 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>>> Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp> writes:
>>>>> Would it be too invasive to introduce a new pointer in TupleTableSlot
>>>>> that is NULL for anything but virtual tuples from foreign tables?
>>>
>>>> I'm not certain whether the duration of TupleTableSlot is enough to
>>>> carry a private datum between scan and modify stage.
>>>
>>> It's not.
>>>
>>>> Is it possible to utilize ctid field to move a private pointer?
>>>
>>> UPDATEs and DELETEs do not rely on the ctid field of tuples to carry the
>>> TID from scan to modify --- in fact, most of the time what the modify
>>> step is going to get is a "virtual" TupleTableSlot that hasn't even
>>> *got* a physical CTID field.
>>>
>>> Instead, the planner arranges for the TID to be carried up as an
>>> explicit resjunk column named ctid. (Currently this is done in
>>> rewriteTargetListUD(), but see also preptlist.c which does some related
>>> things for SELECT FOR UPDATE.)
>>>
>>> I'm inclined to think that what we need here is for FDWs to be able to
>>> modify the details of that behavior, at least to the extent of being
>>> able to specify a different data type than TID for the row
>>> identification column.
>>>
>> Hmm. It seems to me a straight-forward solution rather than ab-use
>> of ctid system column. Probably, cstring data type is more suitable
>> to carry a private datum between scan and modify stage.
>>
>> One problem I noticed is how FDW driver returns an extra field that
>> is in neither system nor regular column.
>> Number of columns and its data type are defined with TupleDesc of
>> the target foreign-table, so we also need a feature to extend it on
>> run-time. For example, FDW driver may have to be able to extend
>> a "virtual" column with cstring data type, even though the target
>> foreign table does not have such a column.
>>
> I tried to investigate the related routines.
>
> TupleDesc of TupleTableSlot associated with ForeignScanState
> is initialized at ExecInitForeignScan as literal.
> ExecAssignScanType assigns TupleDesc of the target foreign-
> table on tts_tupleDescriptor, "as-is".
> It is the reason why IterateForeignScan cannot return a private
> datum except for the columns being declared as regular ones.
>
The attached patch improved its design according to the upthread
discussion. It now got away from ab-use of "ctid" field, and adopts
a concept of pseudo-column to hold row-id with opaque data type
instead.

Pseudo-column is Var reference towards attribute-number larger
than number of attributes on the target relation; thus, it is not
a substantial object. It is normally unavailable to reference such
a larger attribute number because TupleDesc of each ScanState
associated with a particular relation is initialized at ExecInitNode.

The patched ExecInitForeignScan was extended to generate its
own TupleDesc including pseudo-column definitions on the fly,
instead of relation's one, when scan-plan of foreign-table requires
to have pseudo-columns.

Right now, the only possible pseudo-column is "rowid" being
injected at rewriteTargetListUD(). It has no data format
restriction like "ctid" because of VOID data type.
FDW extension can set an appropriate value on the "rowid"
field in addition to contents of regular columns at
IterateForeignScan method, to track which remote row should
be updated or deleted.

Another possible usage of this pseudo-column is push-down
of target-list including complex calculation. It may enable to
move complex mathematical formula into remote devices
(such as GPU device?) instead of just a reference of Var node.

This patch adds a new interface: GetForeignRelInfo being invoked
from get_relation_info() to adjust width of RelOptInfo->attr_needed
according to the target-list which may contain "rowid" pseudo-column.
Some FDW extension may use this interface to push-down a part of
target list into remote side, even though I didn't implement this
feature on file_fdw.

RelOptInfo->max_attr is a good marker whether the plan shall have
pseudo-column reference. Then, ExecInitForeignScan determines
whether it should generate a TupleDesc, or not.

The "rowid" is fetched using ExecGetJunkAttribute as we are currently
doing on regular tables using "ctid", then it shall be delivered to
ExecUpdate or ExecDelete. We can never expect the fist argument of
them now, so "ItemPointer tupleid" redefined to "Datum rowid", and
argument of BR-trigger routines redefined also.

[kaigai(at)iwashi sepgsql]$ cat ~/testfile.csv
10 aaa
11 bbb
12 ccc
13 ddd
14 eee
15 fff
[kaigai(at)iwashi sepgsql]$ psql postgres
psql (9.3devel)
Type "help" for help.

postgres=# UPDATE ftbl SET b = md5(b) WHERE a > 12 RETURNING *;
INFO: ftbl is the target relation of UPDATE
INFO: fdw_file: BeginForeignModify method
INFO: fdw_file: UPDATE (lineno = 4)
INFO: fdw_file: UPDATE (lineno = 5)
INFO: fdw_file: UPDATE (lineno = 6)
INFO: fdw_file: EndForeignModify method
a | b
----+----------------------------------
13 | 77963b7a931377ad4ab5ad6a9cd718aa
14 | d2f2297d6e829cd3493aa7de4416a18f
15 | 343d9040a671c45832ee5381860e2996
(3 rows)

UPDATE 3
postgres=# DELETE FROM ftbl WHERE a % 2 = 1 RETURNING *;
INFO: ftbl is the target relation of DELETE
INFO: fdw_file: BeginForeignModify method
INFO: fdw_file: DELETE (lineno = 2)
INFO: fdw_file: DELETE (lineno = 4)
INFO: fdw_file: DELETE (lineno = 6)
INFO: fdw_file: EndForeignModify method
a | b
----+-----
11 | bbb
13 | ddd
15 | fff
(3 rows)

DELETE 3

In addition, there is a small improvement. ExecForeignInsert,
ExecForeignUpdate and ExecForeignDelete get being able
to return number of processed rows; that allows to push-down
whole the statement into remote-side, if it is enough simple
(e.g, delete statement without any condition).

Even though it does not make matter right now, pseudo-columns
should be adjusted when foreign-table is referenced with table
inheritance feature, because an attribute number being enough
large in parent table is not enough large in child table.
We need to fix up them until foreign table feature got inheritance
capability.

I didn't update the documentation stuff because I positioned
the state of this patch as proof-of-concept now. Please note that.

Thanks,
--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

Attachment Content-Type Size
pgsql-v9.3-writable-fdw-poc.v2.patch application/octet-stream 46.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Satoshi Nagayasu 2012-09-23 10:50:59 [PoC] load balancing in libpq
Previous Message Karl O. Pinc 2012-09-23 05:24:27 Re: Suggestion for --truncate-tables to pg_restore