Re: Optimization for updating foreign tables in Postgres FDW

From: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimization for updating foreign tables in Postgres FDW
Date: 2014-09-02 04:10:36
Message-ID: CAEZqfEcx6monLMhuJwkSZRkL+ZrBdrV+Uv4LuWzekDEYLK_Unw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I confirmed performance gain accomplished by this patch.

This patch makes update queries ~50x faster, and even hit-miss update
is 3x faster than original. Of course benefit is only for queries
whose qualifiers are enough simple so that they can be pushied down
fully, but this improvement is remarkable.

This patch avoids 1) SELECT for determining target rows, and 2)
repeated per-row UPDATE/DELETE in particular situation, so I assumed
that the gain is larger for bulk update, and it's true indeed, but in
fact even hit-miss update (0 row affected) become faster enough. This
would come from the omission of SELECT preceding repeated
UPDATE/DELETE.

I was little worried about overhead in planning phase, but fluctuation
was less than 1ms, so it's negligible.

Measurement Result
==================

Note: numbers below are "execution time" of EXPLAIN ANALYZE, and
average of five runs
--------------+-------------+-----------+--------
rows affected | original | patched | gain
--------------+-------------+-----------+--------
0 | 4.841 | 1.548 | 3.13x
1 | 6.944 | 1.793 | 3.87x
100 | 174.420 | 5.167 | 33.76x
10,000 | 8,215.551 | 163.832 | 50.15x
100,000 | 78,135.905 | 1,595.739 | 48.97x
200,000 | 179,784.928 | 4,305.856 | 41.75x
--------------+-------------+-----------+--------

Measurement procedure
=====================

[Local side]
1) Create foreign table which refers pgbench_accounts on the remote side

[Remote side]
2) pgbench -i -s 100
3) Execute ANALYZE
4) Restart PostgreSQL to clear shared buffers

[Local side]
5) Execute ANALYZE against foreign table
6) Execute UPDATE SQL against foreign table once for warm the cache
7) Execute UPDATE SQL against foreign table five times

Test SQL for 10000-rows cas is below, only aid condition is changed
according to measurement variation.

EXPLAIN ANALYZE VERBOSE UPDATE ft_pgbench_accounts SET bid=bid+1,
abalance=abalance+1, filler='update test' WHERE aid<=10000;

2014-08-29 12:59 GMT+09:00 Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>:
> (2014/08/26 12:20), Etsuro Fujita wrote:
>>
>> (2014/08/25 21:58), Albe Laurenz wrote:
>>>
>>> I played with it, and apart from Hanada's comments I have found the
>>> following:
>>>
>>> test=> EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id > 3;
>>> QUERY PLAN
>>>
>>> ----------------------------------------------------------------------------------------------------------------------------------
>>>
>>> Update on laurenz.rtest (cost=100.00..14134.40 rows=299970
>>> width=10) (actual time=0.005..0.005 rows=0 loops=1)
>>> -> Foreign Scan on laurenz.rtest (cost=100.00..14134.40
>>> rows=299970 width=10) (actual time=0.002..0.002 rows=299997 loops=1)
>>> Output: id, val, ctid
>>> Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE
>>> ((id > 3))
>>> Planning time: 0.179 ms
>>> Execution time: 3706.919 ms
>>> (6 rows)
>>>
>>> Time: 3708.272 ms
>>>
>>> The "actual time" readings are surprising.
>>> Shouldn't these similar to the actual execution time, since most of
>>> the time is spent
>>> in the foreign scan node?
>>
>>
>> I was also thinkng that this is confusing to the users. I think this is
>> because the patch executes the UPDATE/DELETE statement during
>> postgresBeginForeignScan, not postgresIterateForeignScan, as you
>> mentioned below:
>>
>>> Reading the code, I noticed that the pushed down UPDATE or DELETE
>>> statement is executed
>>> during postgresBeginForeignScan rather than during
>>> postgresIterateForeignScan.
>
>
>> I'll modify the patch so as to execute the statement during
>> postgresIterateForeignScan.
>
>
> Done.
>
>
>>> It is not expected that postgresReScanForeignScan is called when the
>>> UPDATE/DELETE
>>> is pushed down, right? Maybe it would make sense to add an assertion
>>> for that.
>>
>>
>> IIUC, that is right. As ModifyTable doesn't support rescan currently,
>> postgresReScanForeignScan needn't to be called in the update pushdown
>> case. The assertion is a good idea. I'll add it.
>
>
> Done.
>
> You can find the updated version of the patch at
>
> http://www.postgresql.org/message-id/53FFFA50.6020007@lab.ntt.co.jp
>
>
> Thanks,
>
> Best regards,
> Etsuro Fujita

--
Shigeru HANADA

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2014-09-02 04:11:35 Re: PL/pgSQL 2
Previous Message Craig Ringer 2014-09-02 04:06:21 Re: Concurrently option for reindexdb