From: | Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp> |
---|---|
To: | Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: SQL/MED - file_fdw |
Date: | 2011-01-13 10:00:57 |
Message-ID: | 20110113190056.828D.6989961C@metrosystems.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, 7 Jan 2011 10:57:17 +0900
Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Mon, Dec 20, 2010 at 20:42, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> > I added comments and moved some setup codes for COPY TO to BeginCopyTo()
> > for maintainability. CopyTo() still contains parts of initialization,
> > but I've not touched it yet because we don't need the arrangement for now.
>
> I updated the COPY FROM API patch.
> - GetCopyExecutorState() is removed because FDWs will use their own context.
I rebased file_fdw patch to recent copy_export patch, and have some
comments.
> The patch just rearranges codes for COPY FROM to export those functions.
> It also modifies some of COPY TO codes internally for code readability.
> - BeginCopyFrom(rel, filename, attnamelist, options)
> - EndCopyFrom(cstate)
> - NextCopyFrom(cstate, OUT values, OUT nulls, OUT tupleOid)
> - CopyFromErrorCallback(arg)
This API set seems to be enough to implement file_fdw using COPY
routines.
But EndCopyFrom() seems not to be able to release memory which is
allocated in BeginCopy() and BeginCopyFrom(). I found this behavior
by executing a query which generates nested loop plan (outer 1000000
row * inner 10 row), and at last postgres grows up to 300MB+ from
108MB (VIRT of top command).
Attached patch would avoid this leak by adding per-copy context to
CopyState. This would be overkill, and ResetCopyFrom() might be
reasonable though.
Anyway, I couldn't find performance degrade with this patch (tested on
my Linux box).
==============
# csv_accounts and csv_branches are generated by:
1) pgbench -i -s 10
2) COPY pgbench_accounts to '/path/to/accounts.csv' WITH CSV;
3) COPY pgbench_branches to '/path/to/branches.csv' WITH CSV;
<Original (There is no memory swap during measurement) >
postgres=# explain analyze select * from csv_accounts b, csv_branches t where t.bid = b.bid;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..11717.01 rows=1 width=200) (actual time=0.300..100833.057 rows=1000000 loops=1)
Join Filter: (b.bid = t.bid)
-> Foreign Scan on csv_accounts b (cost=0.00..11717.00 rows=1 width=100) (actual time=0.148..4437.595 rows=1000000 loops=1)
-> Foreign Scan on csv_branches t (cost=0.00..0.00 rows=1 width=100) (actual time=0.014..0.039 rows=10 loops=1000000)
Total runtime: 102882.308 ms
(5 rows)
<Patched, Using per-copy context to release memory>
postgres=# explain analyze select * from csv_accounts b, csv_branches t where t.bid = b.bid;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..11717.01 rows=1 width=200) (actual time=0.226..100931.864 rows=1000000 loops=1)
Join Filter: (b.bid = t.bid)
-> Foreign Scan on csv_accounts b (cost=0.00..11717.00 rows=1 width=100) (actual time=0.085..4439.777 rows=1000000 loops=1)
-> Foreign Scan on csv_branches t (cost=0.00..0.00 rows=1 width=100) (actual time=0.015..0.039 rows=10 loops=1000000)
Total runtime: 102684.276 ms
(5 rows)
==============
This memory leak would not be problem when using from COPY command
because it handles only one CopyState in a query, and it will be
cleaned up with parent context.
> Some items to be considered:
> - BeginCopyFrom() could receive filename as an option instead of a separated
> argument. If do so, file_fdw would be more simple, but it's a change only for
> file_fdw. COPY commands in the core won't be improved at all.
ISTM that current design would be better.
> - NextCopyFrom() returns values/nulls arrays rather than a HeapTuple. I expect
> the caller store the result into tupletableslot with ExecStoreVirtualTuple().
> It is designed for performance, but if the caller always needs an materialized
> HeapTuple, HeapTuple is better for the result type.
I tried to add tableoid to TupleTableSlot as tts_tableoid, but it
seems to make codes such as slot_getaddr() and other staff tricky.
How about to implement using materialized tuples to avoid unnecessary
(at least for functionality) changes. I would like to send this
virtual-tuple-optimization to next development cycle because it would
not effect the interface heavily. I'll post materialized-tuple
version of foreign_scan patch soon.
Regards,
--
Shigeru Hanada
Attachment | Content-Type | Size |
---|---|---|
20110113-copy_context.patch | application/octet-stream | 4.4 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Joel Jacobson | 2011-01-13 10:13:54 | Possible bug in pg_settings/pg_depend |
Previous Message | Heikki Linnakangas | 2011-01-13 10:00:38 | Re: pg_ctl failover Re: Latches, signals, and waiting |