Re: postgres_fdw behaves oddly

From: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw behaves oddly
Date: 2014-11-24 06:07:37
Message-ID: CAFjFpRcz4Gyj48PF14Rvpm2CTFRuhnrVyKWZjgh26AYhbdpxhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 23, 2014 at 2:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
> > (2014/11/19 18:21), Ashutosh Bapat wrote:
> >> Ok. I added that comment to the commitfest and changed the status to
> >> "ready for commiter".
>
> > Many thanks!
>
> I committed this with some cosmetic adjustments, and one not-so-cosmetic
> one: I left it continuing to allow CTID conditions to be sent to the
> remote server. That works today and we shouldn't disable it, especially
> not in the back branches where it could mean a performance regression
> in a minor release.
>
>
Thanks.

> As for the question of whether reltargetlist or tlist should be examined,
> reltargetlist is the correct thing. I do not see a need to modify
> use_physical_tlist either. If you trace through what happens in e.g.
> postgres_fdw, you'll notice that it only bothers to retrieve actually-used
> columns (which it computes correctly) from the remote side. It then
> constructs a scan tuple that corresponds to the foreign table's physical
> column set, inserting NULLs for any unfetched columns. This is handed
> back to execScan.c and matches what a heapscan or indexscan would produce.
> The point of the use_physical_tlist optimization is to avoid a projection
> step *on this tuple* if we don't really need to do one (ie, if it'd be
> just as good for the scan node's output tuple to be identical to the row's
> physical tuple). If we disabled use_physical_tlist then we'd be forcing
> a projection step that would have the effect of removing the NULLs for
> unfetched columns, but it would not actually be of value, just as it isn't
> in the corresponding cases for heap/index scans.
>
> For the current patch, that's fair.

In grouping_planner() the targetlist of the topmost plan node of join tree
is changed to the set of expressions requested in the query.
1554 else
1555 {
1556 /*
1557 * Otherwise, just replace the subplan's flat
tlist with
1558 * the desired tlist.
1559 */
1560 result_plan->targetlist = sub_tlist;
1561 }
This means that for a simple SELECT from the foreign table, projection
would be necessary in ExecScan if all the columns are not required.

So, the strategy to emulate heap or index scan wastes extra cycles and
memory in adding NULL columns which are not required and then filtering
them during the projection (within ExecScan itself). It also limits the
ability of foreign scans to fetch shippable expressions when those
expressions are part of the targetlist and fetching their values (instead
of the columns participating in the expressions) reduces the size of the
fetched row.

These cases do not exist for heap scans or index scans because the tuples
are obtained directly from the buffer, so no extra memory required for the
extra columns and projection as well as expression evaluation is anyway
required.

BTW, given that there wasn't any very good way to test the createplan.c
> fix except by adding test cases to postgres_fdw, I didn't see much value
> in splitting the patch in two. The committed patch includes tests that
> the createplan.c bug is fixed as well as the postgres_fdw one.
>
> regards, tom lane
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-11-24 07:20:12 Re: [TODO] Track number of files ready to be archived in pg_stat_archiver
Previous Message Amit Kapila 2014-11-24 03:29:03 Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]