Re: postgres_fdw vs data formatting GUCs (was Re: [v9.3] writable foreign tables)

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: postgres_fdw vs data formatting GUCs (was Re: [v9.3] writable foreign tables)
Date: 2013-03-20 01:10:06
Message-ID: CAAZKuFa0H8XxRF9PgKcHj=ffBZNHibcBQJLJBX67nc0ozu-wMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 19, 2013 at 3:06 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> On Tue, Mar 19, 2013 at 2:41 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Daniel Farina <daniel(at)heroku(dot)com> writes:
>>> On Tue, Mar 19, 2013 at 1:16 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> I'd be inclined to eat the cost of calling PQparameterStatus every time
>>>> (which won't be that much) and instead try to optimize by avoiding the
>>>> GUC-setting overhead if the current value matches the local setting.
>>>> But even that might be premature optimization. Did you do any
>>>> performance testing to see if there was a problem worth avoiding?
>>
>>> Nope; should I invent a new way to do that, or would it be up to
>>> commit standard based on inspection alone? I'm okay either way, let
>>> me know.
>>
>> Doesn't seem that hard to test: run a dblink query that pulls back a
>> bunch of data under best-case conditions (ie, local not remote server),
>> and time it with and without the proposed fix. If the difference
>> is marginal then it's not worth working hard to optimize.
>
> Okay, will do, and here's the shorter and less mechanically intensive
> naive version that I think is the baseline: it doesn't try to optimize
> out any GUC settings and sets up the GUCs before the two
> materialization paths in dblink.

The results. Summary: seems like grabbing the GUC and strcmp-ing is
worthwhile, but the amount of ping-ponging between processes adds some
noise to the timing results: utilization is far short of 100% on
either processor involved. Attached is a cumulative diff of the new
version, and also reproduced below are the changes to v2 that make up
v3.

## Benchmark

SELECT dblink_connect('benchconn','dbname=contrib_regression');

CREATE FUNCTION bench() RETURNS integer AS $$
DECLARE
iterations integer;
BEGIN
iterations := 0;

WHILE iterations < 300000 LOOP
PERFORM * FROM dblink('benchconn', 'SELECT 1') AS t(a int);
iterations := iterations + 1;
END LOOP;

RETURN iterations;
END;
$$ LANGUAGE plpgsql;

SELECT clock_timestamp() INTO TEMP beginning;
SELECT bench();
SELECT clock_timestamp() INTO TEMP ending;

SELECT 'dblink-benchmark-lines';
SELECT ending.clock_timestamp - beginning.clock_timestamp
FROM beginning, ending;

## Data Setup

CREATE TEMP TABLE bench_results (version text, span interval);

COPY bench_results FROM STDIN WITH CSV;
no-patch,@ 41.308122 secs
no-patch,@ 36.63597 secs
no-patch,@ 34.264119 secs
no-patch,@ 34.760179 secs
no-patch,@ 32.991257 secs
no-patch,@ 34.538258 secs
no-patch,@ 42.576354 secs
no-patch,@ 39.335557 secs
no-patch,@ 37.493206 secs
no-patch,@ 37.812205 secs
v2-applied,@ 36.550553 secs
v2-applied,@ 38.608723 secs
v2-applied,@ 39.415744 secs
v2-applied,@ 46.091052 secs
v2-applied,@ 45.425438 secs
v2-applied,@ 48.219506 secs
v2-applied,@ 43.514878 secs
v2-applied,@ 45.892302 secs
v2-applied,@ 48.479335 secs
v2-applied,@ 47.632041 secs
v3-strcmp,@ 32.524385 secs
v3-strcmp,@ 34.982098 secs
v3-strcmp,@ 34.487222 secs
v3-strcmp,@ 44.394681 secs
v3-strcmp,@ 44.638309 secs
v3-strcmp,@ 44.113088 secs
v3-strcmp,@ 45.497769 secs
v3-strcmp,@ 33.530176 secs
v3-strcmp,@ 32.9704 secs
v3-strcmp,@ 40.84764 secs
\.

=> SELECT version, avg(extract(epoch from span)), stddev(extract(epoch
from span))
FROM bench_results
GROUP BY version;
version | avg | stddev
------------+------------+------------------
no-patch | 37.1715227 | 3.17076487912923
v2-applied | 43.9829572 | 4.30572672565711
v3-strcmp | 38.7985768 | 5.54760393720725

## Changes to v2:

--- a/contrib/dblink/dblink.c
+++ b/contrib/dblink/dblink.c
@@ -2981,9 +2981,11 @@ initRemoteGucs(remoteGucs *rgs, PGconn *conn)
static void
applyRemoteGucs(remoteGucs *rgs)
{
- int i;
const int numGucs = sizeof parseAffectingGucs / sizeof *parseAffectingGucs;

+ int i;
+ int addedGucNesting = false;
+
/*
* Affected types require local GUC manipulations. Create a new
* GUC NestLevel to overlay the remote settings.
@@ -2992,14 +2994,27 @@ applyRemoteGucs(remoteGucs *rgs)
* structure, so expect it to come with an invalid NestLevel.
*/
Assert(rgs->localGUCNestLevel == -1);
- rgs->localGUCNestLevel = NewGUCNestLevel();

for (i = 0; i < numGucs; i += 1)
{
+ int gucApplyStatus;
+
const char *gucName = parseAffectingGucs[i];
const char *remoteVal = PQparameterStatus(rgs->conn, gucName);
+ const char *localVal = GetConfigOption(gucName, true, true);

- int gucApplyStatus;
+ /*
+ * Attempt to avoid GUC setting if the remote and local GUCs
+ * already have the same value.
+ */
+ if (strcmp(remoteVal, localVal) == 0)
+ continue;
+
+ if (!addedGucNesting)
+ {
+ rgs->localGUCNestLevel = NewGUCNestLevel();
+ addedGucNesting = true;
+ }

gucApplyStatus = set_config_option(gucName, remoteVal,
PGC_USERSET, PGC_S_SESSION,

--
fdr

Attachment Content-Type Size
dblink-guc-sensitive-types-v3.patch application/octet-stream 14.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-03-20 02:05:26 Re: Enabling Checksums
Previous Message Greg Smith 2013-03-20 00:54:14 Re: Enabling Checksums