Re: SQL/MED - file_fdw

From: Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SQL/MED - file_fdw
Date: 2011-01-11 09:20:14
Message-ID: 20110111182013.BD6F.6989961C@metrosystems.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 10 Jan 2011 19:26:11 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Shigeru HANADA <hanada(at)metrosystems(dot)co(dot)jp> writes:
> > For the purpose of file_fdw, additional ResetCopyFrom() would be
> > necessary. I'm planning to include such changes in file_fdw patch.
> > Please find attached partial patch for ResetCopyFrom(). Is there
> > anything else which should be done at reset?
>
> Seems like it would be smarter to close and re-open the copy operation.
> Adding a reset function is just creating an additional maintenance
> burden and point of failure, for what seems likely to be a negligible
> performance benefit.

Agreed. fileReScan can be implemented with close/re-open with storing
some additional information into FDW private area. I would withdraw
the proposal.

> If you think it's not negligible, please show some proof of that before
> asking us to support such code.

Anyway, I've measured overhead of re-open with executing query
including inner join between foreign tables copied from pgbench schema.
I used SELECT statement below:

EXPLAIN (ANALYZE) SELECT count(*) FROM csv_accounts a JOIN
csv_branches b ON (b.bid = a.bid);

On the average of (Nested Loop - (Foreign Scan * 2)), overhead of
re-open is round 0.048ms per tuple (average of 3 times measurement).

After the implementation of file_fdw, I'm going to measure again. If
ResetCopyFrom significantly improves performance of ReScan, I'll
propose it as a separate patch.

=========================================================================

The results of EXPLAIN ANALYZE are:

[using ResetCopyFrom]
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11717.02..11717.03 rows=1 width=0) (actual time=73357.655..73357.657 rows=1 loops=1)
-> Nested Loop (cost=0.00..11717.01 rows=1 width=0) (actual time=0.209..71424.059 rows=1000000 loops=1)
-> Foreign Scan on public.csv_accounts a (cost=0.00..11717.00 rows=1 width=4) (actual time=0.144..6998.497 rows=1000000 loops=1)
-> Foreign Scan on public.csv_branches b (cost=0.00..0.00 rows=1 width=4) (actual time=0.008..0.037 rows=10 loops=1000000)
Total runtime: 73358.135 ms
(11 rows)

[using EndCopyFrom + BeginCopyFrom]
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=11717.02..11717.03 rows=1 width=0) (actual time=120724.138..120724.140 rows=1 loops=1)
-> Nested Loop (cost=0.00..11717.01 rows=1 width=0) (actual time=0.321..118583.681 rows=1000000 loops=1)
-> Foreign Scan on public.csv_accounts a (cost=0.00..11717.00 rows=1 width=4) (actual time=0.156..7208.968 rows=1000000 loops=1)
-> Foreign Scan on public.csv_branches b (cost=0.00..0.00 rows=1 width=4) (actual time=0.016..0.046 rows=10 loops=1000000)
Total runtime: 121118.792 ms
(11 rows)

Time: 121122.205 ms

=========================================================================

Regards,
--
Shigeru Hanada

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-01-11 09:24:46 Re: ALTER TYPE 0: Introduction; test cases
Previous Message Jeff Davis 2011-01-11 09:16:47 WIP: RangeTypes