Re: SQL/MED - file_fdw

From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SQL/MED - file_fdw
Date: 2011-01-13 10:00:57
Message-ID: 20110113190056.828D.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 7 Jan 2011 10:57:17 +0900
Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Mon, Dec 20, 2010 at 20:42, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> > I added comments and moved some setup codes for COPY TO to BeginCopyTo()
> > for maintainability. CopyTo() still contains parts of initialization,
> > but I've not touched it yet because we don't need the arrangement for now.
>
> I updated the COPY FROM API patch.
> - GetCopyExecutorState() is removed because FDWs will use their own context.

I rebased file_fdw patch to recent copy_export patch, and have some
comments.

> The patch just rearranges codes for COPY FROM to export those functions.
> It also modifies some of COPY TO codes internally for code readability.
> - BeginCopyFrom(rel, filename, attnamelist, options)
> - EndCopyFrom(cstate)
> - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid)
> - CopyFromErrorCallback(arg)

This API set seems to be enough to implement file_fdw using COPY
routines.

But EndCopyFrom() seems not to be able to release memory which is
allocated in BeginCopy() and BeginCopyFrom(). I found this behavior
by executing a query which generates nested loop plan (outer 1000000
row * inner 10 row), and at last postgres grows up to 300MB+ from
108MB (VIRT of top command).

Attached patch would avoid this leak by adding per-copy context to
CopyState. This would be overkill, and ResetCopyFrom() might be
reasonable though.

Anyway, I couldn't find performance degrade with this patch (tested on
my Linux box).

==============
# csv_accounts and csv_branches are generated by:
1) pgbench -i -s 10
2) COPY pgbench_accounts to '/path/to/accounts.csv' WITH CSV;
3) COPY pgbench_branches to '/path/to/branches.csv' WITH CSV;

<Original (There is no memory swap during measurement) >
postgres=# explain analyze select * from csv_accounts b, csv_branches t where t.bid = b.bid;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..11717.01 rows=1 width=200) (actual time=0.300..100833.057 rows=1000000 loops=1)
Join Filter: (b.bid = t.bid)
-> Foreign Scan on csv_accounts b (cost=0.00..11717.00 rows=1 width=100) (actual time=0.148..4437.595 rows=1000000 loops=1)
-> Foreign Scan on csv_branches t (cost=0.00..0.00 rows=1 width=100) (actual time=0.014..0.039 rows=10 loops=1000000)
Total runtime: 102882.308 ms
(5 rows)

<Patched, Using per-copy context to release memory>
postgres=# explain analyze select * from csv_accounts b, csv_branches t where t.bid = b.bid;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..11717.01 rows=1 width=200) (actual time=0.226..100931.864 rows=1000000 loops=1)
Join Filter: (b.bid = t.bid)
-> Foreign Scan on csv_accounts b (cost=0.00..11717.00 rows=1 width=100) (actual time=0.085..4439.777 rows=1000000 loops=1)
-> Foreign Scan on csv_branches t (cost=0.00..0.00 rows=1 width=100) (actual time=0.015..0.039 rows=10 loops=1000000)
Total runtime: 102684.276 ms
(5 rows)
==============

This memory leak would not be problem when using from COPY command
because it handles only one CopyState in a query, and it will be
cleaned up with parent context.

> Some items to be considered:
> - BeginCopyFrom() could receive filename as an option instead of a separated
> argument. If do so, file_fdw would be more simple, but it's a change only for
> file_fdw. COPY commands in the core won't be improved at all.

ISTM that current design would be better.

> - NextCopyFrom() returns values/nulls arrays rather than a HeapTuple. I expect
> the caller store the result into tupletableslot with ExecStoreVirtualTuple().
> It is designed for performance, but if the caller always needs an materialized
> HeapTuple, HeapTuple is better for the result type.

I tried to add tableoid to TupleTableSlot as tts_tableoid, but it
seems to make codes such as slot_getaddr() and other staff tricky.

How about to implement using materialized tuples to avoid unnecessary
(at least for functionality) changes. I would like to send this
virtual-tuple-optimization to next development cycle because it would
not effect the interface heavily. I'll post materialized-tuple
version of foreign_scan patch soon.

Regards,
--
Shigeru Hanada

Attachment Content-Type Size
20110113-copy_context.patch application/octet-stream 4.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joel Jacobson 2011-01-13 10:13:54 Possible bug in pg_settings/pg_depend
Previous Message Heikki Linnakangas 2011-01-13 10:00:38 Re: pg_ctl failover Re: Latches, signals, and waiting