Quick Links

Re: Async execution of postgres_fdw.

Lists:	pgsql-hackers

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Async execution of postgres_fdw.
Date:	2014-12-15 09:09:44
Message-ID:	20141215.180944.242184148.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, this is the 2nd session of 'intoroducing parallelism using
postgres_fdw'.

The two patch attached are as following,

- 0001-Async-exec-of-postgres_fdw.patch
Main patch, which includes all functions.

- 0002-rename-PGConn-variable.patch
Renaming the variable conn for readability. No functional
effect.

* Outline of this patch

From some consideration after the previous discussion and
comments from others, I judged the original (WIP) patch was
overdone as the first step. So I reduced the patch to minimal
function. The new patch does the following,

- Wrapping PGconn by PgFdwConn in order to handle multiple scans
on one connection.

- The core async logic was added in fetch_more_data().

- Invoking remote commands asynchronously in ExecInitForeignScan.

- Canceling async invocation if any other foreign scans,
modifies, deletes use the same connection.

Cancellation is done by immediately fetching the return of
already-invoked acync command.

* Where this patch will be effective.

With upcoming inheritance-partition feature, this patch enables
stating and running foreign scans asynchronously. It will be more
effective for longer TAT or remote startup times, and larger
number of foreign servers. No negative performance effect on
other situations.

* Concerns about this patch.

- This breaks the assumption that scan starts at ExecForeignScan,
not ExecInitForeignScan, which might cause some problem.

- error reporting code in do_sql_command is quite ugly..

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Async-exec-of-postgres_fdw.patch	text/x-patch	26.3 KB
0002-rename-PGConn-variable.patch	text/x-patch	13.9 KB

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Async execution of postgres_fdw.
Date:	2014-12-18 06:10:22
Message-ID:	CAFjFpRdgY0gF72k7vPm0s1LU7J8V1W9T8X-aFpEY3WT81dSmQg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Horiguchi-san,
Here are my comments for the patches together

Sanity
1. The patch applies cleanly but has trailing white spaces.
[ashutosh(at)ubuntu coderoot]git apply
/mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch
/mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:41: trailing whitespace.
entry->conn =
/mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:44: trailing whitespace.

/mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:611: trailing
whitespace.

warning: 3 lines add whitespace errors.

2. The patches compile cleanly.
3. The regression is clean, even in contrib/postgres_fdw and
contrib/file_fdw

Tests
-------
We need tests to make sure that the logic remains intact even after further
changes in this area. Couple of tests which require multiple foreign scans
within the same query fetching rows more than fetch size (100) would be
required. Also, some DMLs, which involve multiple foreign scans would test
the sanity when UPDATE/DELETE interleave such scans. By multiple foreign
scans I mean both multiple scans on a single foreign server and multiple
scans spread across multiple foreign servers.

Code
-------
Because previous "conn" is now replaced by "conn->pgconn", the double
indirection makes the code a bit ugly and prone to segfaults (conn being
NULL or invalid pointer). Can we minimize such code or wrap it under a
macro?

We need some comments about the structure definition of PgFdwConn and its
members explaining the purpose of this structure and its members.

Same is the case with enum PgFdwConnState. In fact, the state diagram of a
connection has become more complicated with the async connections, so it
might be better to explain that state diagram at one place in the code
(through comments). The definition of the enum might be a good place to do
that. Otherwise, the logic of connection maintenance is spread at multiple
places and is difficult to understand by looking at the code.

In function GetConnection(), at line
elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\"",
- entry->conn, server->servername);
+ entry->conn->pgconn, server->servername);

entry->conn->pgconn may not necessarily be a new connection and may be NULL
(as the next line check it for being NULL). So, I think this line should be
moved within the following if block after pgconn has been initialised with
the new connection.
+ if (entry->conn->pgconn == NULL)
+ {
+ entry->conn->pgconn = connect_pg_server(server, user);
+ entry->conn->nscans = 0;
+ entry->conn->async_state = PGFDW_CONN_IDLE;
+ entry->conn->async_scan = NULL;
+ }

The if condition if (entry->conn == NULL) in GetConnection(), used to track
whether there is a PGConn active for the given entry, now it tracks whether
it has PgFdwConn for the same.

Please see more comments inline.

On Mon, Dec 15, 2014 at 2:39 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

>
>
> * Outline of this patch
>
> From some consideration after the previous discussion and
> comments from others, I judged the original (WIP) patch was
> overdone as the first step. So I reduced the patch to minimal
> function. The new patch does the following,
>
> - Wrapping PGconn by PgFdwConn in order to handle multiple scans
> on one connection.
>
> - The core async logic was added in fetch_more_data().
>

It might help if you can explain this logic in this mail as well as in code
(as per my comment above).

>
> - Invoking remote commands asynchronously in ExecInitForeignScan.
>

> - Canceling async invocation if any other foreign scans,
> modifies, deletes use the same connection.
>

> Cancellation is done by immediately fetching the return of
> already-invoked acync command.
>

> * Where this patch will be effective.
>
> With upcoming inheritance-partition feature, this patch enables
> stating and running foreign scans asynchronously. It will be more
> effective for longer TAT or remote startup times, and larger
> number of foreign servers. No negative performance effect on
> other situations.
>
>
AFAIU, this logic sends only the first query in asynchronous manner not all
of them. Is that right? If yes, I think it's a sever limitation of the
feature. For a query involving multiple foreign scans, only the first one
will be done in async fashion and not the rest. Sorry, if my understanding
is wrong.

I think, we need some data which shows the speed up by this patch. You may
construct a case, where a single query involved multiple foreign scans, and
we can check what is the speed up obtained against the number of foreign
scans.

> * Concerns about this patch.
>
> - This breaks the assumption that scan starts at ExecForeignScan,
> not ExecInitForeignScan, which might cause some problem.
>
>
This should be fine as long as it doesn't have any side effects like
sending query during EXPLAIN (which has been taken care of in this patch.)
Do you think, we need any special handling for PREPAREd statements?

> - error reporting code in do_sql_command is quite ugly..
>
>
>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2014-12-24 07:32:12
Message-ID:	20141224.163212.21062518.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, thank you for the comment, Ashutosh.

I'll return after the New Year holidays. Very sorry not
addressing them sooner but then I will have more time on this.

Have a happy holidays.

regards,

=====
> Hi Horiguchi-san,
> Here are my comments for the patches together
>
> Sanity
> 1. The patch applies cleanly but has trailing white spaces.
> [ashutosh(at)ubuntu coderoot]git apply
> /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch
> /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:41: trailing whitespace.
> entry->conn =
> /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:44: trailing whitespace.
>
> /mnt/hgfs/tmp/0001-Async-exec-of-postgres_fdw.patch:611: trailing
> whitespace.
>
> warning: 3 lines add whitespace errors.
>
> 2. The patches compile cleanly.
> 3. The regression is clean, even in contrib/postgres_fdw and
> contrib/file_fdw
>
> Tests
> -------
> We need tests to make sure that the logic remains intact even after further
> changes in this area. Couple of tests which require multiple foreign scans
> within the same query fetching rows more than fetch size (100) would be
> required. Also, some DMLs, which involve multiple foreign scans would test
> the sanity when UPDATE/DELETE interleave such scans. By multiple foreign
> scans I mean both multiple scans on a single foreign server and multiple
> scans spread across multiple foreign servers.
>
> Code
> -------
> Because previous "conn" is now replaced by "conn->pgconn", the double
> indirection makes the code a bit ugly and prone to segfaults (conn being
> NULL or invalid pointer). Can we minimize such code or wrap it under a
> macro?
>
> We need some comments about the structure definition of PgFdwConn and its
> members explaining the purpose of this structure and its members.
>
> Same is the case with enum PgFdwConnState. In fact, the state diagram of a
> connection has become more complicated with the async connections, so it
> might be better to explain that state diagram at one place in the code
> (through comments). The definition of the enum might be a good place to do
> that. Otherwise, the logic of connection maintenance is spread at multiple
> places and is difficult to understand by looking at the code.
>
> In function GetConnection(), at line
> elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\"",
> - entry->conn, server->servername);
> + entry->conn->pgconn, server->servername);
>
> entry->conn->pgconn may not necessarily be a new connection and may be NULL
> (as the next line check it for being NULL). So, I think this line should be
> moved within the following if block after pgconn has been initialised with
> the new connection.
> + if (entry->conn->pgconn == NULL)
> + {
> + entry->conn->pgconn = connect_pg_server(server, user);
> + entry->conn->nscans = 0;
> + entry->conn->async_state = PGFDW_CONN_IDLE;
> + entry->conn->async_scan = NULL;
> + }
>
> The if condition if (entry->conn == NULL) in GetConnection(), used to track
> whether there is a PGConn active for the given entry, now it tracks whether
> it has PgFdwConn for the same.
>
> Please see more comments inline.
>
> On Mon, Dec 15, 2014 at 2:39 PM, Kyotaro HORIGUCHI <
> horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> >
> >
> > * Outline of this patch
> >
> > From some consideration after the previous discussion and
> > comments from others, I judged the original (WIP) patch was
> > overdone as the first step. So I reduced the patch to minimal
> > function. The new patch does the following,
> >
> > - Wrapping PGconn by PgFdwConn in order to handle multiple scans
> > on one connection.
> >
> > - The core async logic was added in fetch_more_data().
> >
>
> It might help if you can explain this logic in this mail as well as in code
> (as per my comment above).
>
>
> >
> > - Invoking remote commands asynchronously in ExecInitForeignScan.
> >
>
> > - Canceling async invocation if any other foreign scans,
> > modifies, deletes use the same connection.
> >
>
> > Cancellation is done by immediately fetching the return of
> > already-invoked acync command.
> >
>
> > * Where this patch will be effective.
> >
> > With upcoming inheritance-partition feature, this patch enables
> > stating and running foreign scans asynchronously. It will be more
> > effective for longer TAT or remote startup times, and larger
> > number of foreign servers. No negative performance effect on
> > other situations.
> >
> >
> AFAIU, this logic sends only the first query in asynchronous manner not all
> of them. Is that right? If yes, I think it's a sever limitation of the
> feature. For a query involving multiple foreign scans, only the first one
> will be done in async fashion and not the rest. Sorry, if my understanding
> is wrong.
>
> I think, we need some data which shows the speed up by this patch. You may
> construct a case, where a single query involved multiple foreign scans, and
> we can check what is the speed up obtained against the number of foreign
> scans.
>
>
>
> > * Concerns about this patch.
> >
> > - This breaks the assumption that scan starts at ExecForeignScan,
> > not ExecInitForeignScan, which might cause some problem.
> >
> >
> This should be fine as long as it doesn't have any side effects like
> sending query during EXPLAIN (which has been taken care of in this patch.)
> Do you think, we need any special handling for PREPAREd statements?
>
>
> > - error reporting code in do_sql_command is quite ugly..
> >
> >
> >
> > regards,
> >
> > --
> > Kyotaro Horiguchi
> > NTT Open Source Software Center

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-09 08:30:03
Message-ID:	20150109.173003.218333899.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, thank you for the comment.

This is the second version of the patch.

- Refactored to make the code simpler and clearer.
- Added comment about logic outline and struct members.
- Removed trailig white spaces..

- No additional test yet.

======
> warning: 3 lines add whitespace errors.

Whoops. Fixed.

> 2. The patches compile cleanly.
> 3. The regression is clean, even in contrib/postgres_fdw and
> contrib/file_fdw
>
> Tests
> -------
> We need tests to make sure that the logic remains intact even after further
> changes in this area. Couple of tests which require multiple foreign scans
> within the same query fetching rows more than fetch size (100) would be
> required. Also, some DMLs, which involve multiple foreign scans would test
> the sanity when UPDATE/DELETE interleave such scans. By multiple foreign
> scans I mean both multiple scans on a single foreign server and multiple
> scans spread across multiple foreign servers.

Additional tests indeed might be needed. Some of the test related
to this patch are implicitly done in the present regression
tests. But no explicit ones.

fetch_size is currently a bare constant so I think it is not so
necessary to test for other fetch sizes. Even if different size
will potentially cause a problem, it will be found when the
different number is actually applied.

On the current design, async scan is started only on the first
scan on the connection, and if the next scan or modify claims the
same connection, the async state is immediately finished and
behaves as the same as the unpatched version. But since
asynchronous/parallel scan is introduced in any form, such kind
of test seems to be needed.

multi-server tests are not done also in the unpatched version but
there's no difference between multiple foregn servers on the same
remote server and them distributed on multiple remotes. The async
scan of this patch works only on the same foreign server so there
seems to be no need such kind of test. Do you have any specific
concern about this?

After all, I will add some explict tests for async-canceling in
the next patch.

> Code
> -------
> Because previous "conn" is now replaced by "conn->pgconn", the double
> indirection makes the code a bit ugly and prone to segfaults (conn being
> NULL or invalid pointer). Can we minimize such code or wrap it under a
> macro?

Agreed. It was annoyance also for me. I've done the following
things to encapsulate PgFdwConn to some extent in the second
version of this patch. They are described below.

> We need some comments about the structure definition of PgFdwConn and its
> members explaining the purpose of this structure and its members.

Thank you for pointing that. I forgot that. I added simple
comments there.

> Same is the case with enum PgFdwConnState. In fact, the state diagram of a
> connection has become more complicated with the async connections, so it
> might be better to explain that state diagram at one place in the code
> (through comments). The definition of the enum might be a good place to do
> that.

I added a comment describing the and logic and meaning of the
statesjust above the enum declaration.

> Otherwise, the logic of connection maintenance is spread at multiple
> places and is difficult to understand by looking at the code.
>
> In function GetConnection(), at line
> elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\"",
> - entry->conn, server->servername);
> + entry->conn->pgconn, server->servername);

Thank you, I replaced conn's in this form with PFC_PGCONN(conn).

> entry->conn->pgconn may not necessarily be a new connection and may be NULL
> (as the next line check it for being NULL). So, I think this line should be
> moved within the following if block after pgconn has been initialised with
> the new connection.
> + if (entry->conn->pgconn == NULL)
> + {
> + entry->conn->pgconn = connect_pg_server(server, user);
> + entry->conn->nscans = 0;
> + entry->conn->async_state = PGFDW_CONN_IDLE;
> + entry->conn->async_scan = NULL;
> + }
>
> The if condition if (entry->conn == NULL) in GetConnection(), used to track
> whether there is a PGConn active for the given entry, now it tracks whether
> it has PgFdwConn for the same.

After some soncideration, I decided to make PgFdwConn to be
handled more similarly to PGconn. This patch has shrunk as a
result and bacame looks clear.

- Added macros to encapsulate PgFdwConn struct. (One of them is a function)

- Added macros to call PQxxx functions taking PgFdwConn.

- connect_pg_server() returns PgFdwConn.

- connection.c does not touch the inside of PgFdwConn except a
few points. The PgFdwConn's memory is allocated with malloc()
as PGconn and freed by PFCfinish() which is the correspondent
of PQfinish().

As the result of those chagnes, this patch has altered into the
following shape.

- All points where PGconn is used now uses PgFdwConn. They are
seemingly simple replacements.

- The major functional changes are concentrated within
fetch_more_data(), postgreBeginForeignScan(), GetConnection() ,
ReleaseConnection(), and the additional function
finish_async_connection().

> Please see more comments inline.

> > * Outline of this patch
> >
> > From some consideration after the previous discussion and
> > comments from others, I judged the original (WIP) patch was
> > overdone as the first step. So I reduced the patch to minimal
> > function. The new patch does the following,
> >
> > - Wrapping PGconn by PgFdwConn in order to handle multiple scans
> > on one connection.
> >
> > - The core async logic was added in fetch_more_data().
> >
>
> It might help if you can explain this logic in this mail as well as in code
> (as per my comment above).

I wrote the outline of the logic in the comment for enum
PgFdwConnState in postgres_fdw.h. Is it make sense?

> > * Where this patch will be effective.
> >
> > With upcoming inheritance-partition feature, this patch enables
> > stating and running foreign scans asynchronously. It will be more
> > effective for longer TAT or remote startup times, and larger
> > number of foreign servers. No negative performance effect on
> > other situations.
> >
> >
> AFAIU, this logic sends only the first query in asynchronous manner not all
> of them. Is that right? If yes, I think it's a sever limitation of the
> feature. For a query involving multiple foreign scans, only the first one
> will be done in async fashion and not the rest. Sorry, if my understanding
> is wrong.

You're right for the first point. So the domain I think this is
effective is the case of sharding. Each remote server can have
dedicate PGconn connection in the case. Addition to it, the
ongoing FDW Join pushdown should increase the chance for async
execution in this manner. I found that It is difficult to find
the appropriate policy for managing the load on the remote server
when multiple PGconn connection for single remote server, so it
would be the next issue.

> I think, we need some data which shows the speed up by this patch. You may
> construct a case, where a single query involved multiple foreign scans, and
> we can check what is the speed up obtained against the number of foreign
> scans.

Agreed, I'll show you some such figures afterwards.

> > * Concerns about this patch.
> >
> > - This breaks the assumption that scan starts at ExecForeignScan,
> > not ExecInitForeignScan, which might cause some problem.
> >
> >
> This should be fine as long as it doesn't have any side effects like
> sending query during EXPLAIN (which has been taken care of in this patch.)
> Do you think, we need any special handling for PREPAREd statements?

I suppose there's no difference between PREAPREd and
not-PREPAREDd at the level of FDW.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Asynchronous-execution-of-postgres_fdw-v2.patch	text/x-patch	31.5 KB

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-12 09:36:20
Message-ID:	CAFjFpRf7iXYENyUfMFJXfPB4v=yTw2J3ea2=yTXfG4=+E-QS5Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jan 9, 2015 at 2:00 PM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hello, thank you for the comment.
>
> This is the second version of the patch.
>
> - Refactored to make the code simpler and clearer.
> - Added comment about logic outline and struct members.
> - Removed trailig white spaces..
>
> - No additional test yet.
>
>
> ======
> > warning: 3 lines add whitespace errors.
>
> Whoops. Fixed.
>
> > 2. The patches compile cleanly.
> > 3. The regression is clean, even in contrib/postgres_fdw and
> > contrib/file_fdw
> >
> > Tests
> > -------
> > We need tests to make sure that the logic remains intact even after
> further
> > changes in this area. Couple of tests which require multiple foreign
> scans
> > within the same query fetching rows more than fetch size (100) would be
> > required. Also, some DMLs, which involve multiple foreign scans would
> test
> > the sanity when UPDATE/DELETE interleave such scans. By multiple foreign
> > scans I mean both multiple scans on a single foreign server and multiple
> > scans spread across multiple foreign servers.
>
> Additional tests indeed might be needed. Some of the test related
> to this patch are implicitly done in the present regression
> tests. But no explicit ones.
>
> fetch_size is currently a bare constant so I think it is not so
> necessary to test for other fetch sizes. Even if different size
> will potentially cause a problem, it will be found when the
> different number is actually applied.
>
> On the current design, async scan is started only on the first
> scan on the connection, and if the next scan or modify claims the
> same connection, the async state is immediately finished and
> behaves as the same as the unpatched version. But since
> asynchronous/parallel scan is introduced in any form, such kind
> of test seems to be needed.
>
> multi-server tests are not done also in the unpatched version but
> there's no difference between multiple foregn servers on the same
> remote server and them distributed on multiple remotes. The async
> scan of this patch works only on the same foreign server so there
> seems to be no need such kind of test. Do you have any specific
> concern about this?
>
> After all, I will add some explict tests for async-canceling in
> the next patch.
>
> > Code
> > -------
> > Because previous "conn" is now replaced by "conn->pgconn", the double
> > indirection makes the code a bit ugly and prone to segfaults (conn being
> > NULL or invalid pointer). Can we minimize such code or wrap it under a
> > macro?
>
> Agreed. It was annoyance also for me. I've done the following
> things to encapsulate PgFdwConn to some extent in the second
> version of this patch. They are described below.
>

Looks better.

>
> > We need some comments about the structure definition of PgFdwConn and its
> > members explaining the purpose of this structure and its members.
>
> Thank you for pointing that. I forgot that. I added simple
> comments there.
>
> > Same is the case with enum PgFdwConnState. In fact, the state diagram of
> a
> > connection has become more complicated with the async connections, so it
> > might be better to explain that state diagram at one place in the code
> > (through comments). The definition of the enum might be a good place to
> do
> > that.
>
> I added a comment describing the and logic and meaning of the
> statesjust above the enum declaration.
>

This needs to be clarified further. But that can wait till we finalise the
approach and the patch. Esp. comment below is confusing
1487 + * PGFDW_CONN_SYNC_RUNNING is rather an internal state in
1488 + * fetch_more_data(). It indicates that the function shouldn't send
the next
1489 + * fetch requst after getting the result.

I couldn't get the meaning of the second sentence, esp. it's connection
with synchronous-ness.

> Otherwise, the logic of connection maintenance is spread at multiple
> > places and is difficult to understand by looking at the code.
> >
> > In function GetConnection(), at line
> > elog(DEBUG3, "new postgres_fdw connection %p for server \"%s\"",
> > - entry->conn, server->servername);
> > + entry->conn->pgconn, server->servername);
>
> Thank you, I replaced conn's in this form with PFC_PGCONN(conn).
>

This looks better.

>
> > entry->conn->pgconn may not necessarily be a new connection and may be
> NULL
> > (as the next line check it for being NULL). So, I think this line should
> be
> > moved within the following if block after pgconn has been initialised
> with
> > the new connection.
> > + if (entry->conn->pgconn == NULL)
> > + {
> > + entry->conn->pgconn = connect_pg_server(server, user);
> > + entry->conn->nscans = 0;
> > + entry->conn->async_state = PGFDW_CONN_IDLE;
> > + entry->conn->async_scan = NULL;
> > + }
> >
> > The if condition if (entry->conn == NULL) in GetConnection(), used to
> track
> > whether there is a PGConn active for the given entry, now it tracks
> whether
> > it has PgFdwConn for the same.
>
> After some soncideration, I decided to make PgFdwConn to be
> handled more similarly to PGconn. This patch has shrunk as a
> result and bacame looks clear.
>

I think it's still prone to segfaults considering two pointer indirections.

>
> - Added macros to encapsulate PgFdwConn struct. (One of them is a function)
>
> - Added macros to call PQxxx functions taking PgFdwConn.
>
> - connect_pg_server() returns PgFdwConn.
>
> - connection.c does not touch the inside of PgFdwConn except a
> few points. The PgFdwConn's memory is allocated with malloc()
> as PGconn and freed by PFCfinish() which is the correspondent
> of PQfinish().
>
> As the result of those chagnes, this patch has altered into the
> following shape.
>
> - All points where PGconn is used now uses PgFdwConn. They are
> seemingly simple replacements.
>
> - The major functional changes are concentrated within
> fetch_more_data(), postgreBeginForeignScan(), GetConnection() ,
> ReleaseConnection(), and the additional function
> finish_async_connection().
>
> > Please see more comments inline.
>
> > > * Outline of this patch
> > >
> > > From some consideration after the previous discussion and
> > > comments from others, I judged the original (WIP) patch was
> > > overdone as the first step. So I reduced the patch to minimal
> > > function. The new patch does the following,
> > >
> > > - Wrapping PGconn by PgFdwConn in order to handle multiple scans
> > > on one connection.
> > >
> > > - The core async logic was added in fetch_more_data().
> > >
> >
> > It might help if you can explain this logic in this mail as well as in
> code
> > (as per my comment above).
>
> I wrote the outline of the logic in the comment for enum
> PgFdwConnState in postgres_fdw.h. Is it make sense?
>

The point about two different ForeignScan nodes using the same connection
needs some clarification, I guess. It's not very clear, why would there be
more queries run on the same connection. I know why this happens, but it's
important to mention it somewhere. If it's already mentioned somewhere in
the file, sorry for not paying attention to that.

>
> > > * Where this patch will be effective.
> > >
> > > With upcoming inheritance-partition feature, this patch enables
> > > stating and running foreign scans asynchronously. It will be more
> > > effective for longer TAT or remote startup times, and larger
> > > number of foreign servers. No negative performance effect on
> > > other situations.
> > >
> > >
> > AFAIU, this logic sends only the first query in asynchronous manner not
> all
> > of them. Is that right? If yes, I think it's a sever limitation of the
> > feature. For a query involving multiple foreign scans, only the first one
> > will be done in async fashion and not the rest. Sorry, if my
> understanding
> > is wrong.
>
> You're right for the first point. So the domain I think this is
> effective is the case of sharding. Each remote server can have
> dedicate PGconn connection in the case. Addition to it, the
> ongoing FDW Join pushdown should increase the chance for async
> execution in this manner. I found that It is difficult to find
> the appropriate policy for managing the load on the remote server
> when multiple PGconn connection for single remote server, so it
> would be the next issue.
>

I think there is more chance that there will more than one ForeignScan
nodes trying interact with a foreign server, even after the push-down work.
The current solution doesn't address that. We actually need parallel
querying in two cases
1. Querying multiple servers in parallel
2. Querying same server (by two querists) in parallel within the same query
e.g. an un-pushable join.

We need a solution which is work in both the cases.

Is it possible to use the parallel query infrastructure being built by
Robert or to do something like parallel seq scan? That will work, not just
for Postgres FDW but all the FDWs.

>
> > I think, we need some data which shows the speed up by this patch. You
> may
> > construct a case, where a single query involved multiple foreign scans,
> and
> > we can check what is the speed up obtained against the number of foreign
> > scans.
>
> Agreed, I'll show you some such figures afterwards.
>
> > > * Concerns about this patch.
> > >
> > > - This breaks the assumption that scan starts at ExecForeignScan,
> > > not ExecInitForeignScan, which might cause some problem.
> > >
> > >
> > This should be fine as long as it doesn't have any side effects like
> > sending query during EXPLAIN (which has been taken care of in this
> patch.)
> > Do you think, we need any special handling for PREPAREd statements?
>
> I suppose there's no difference between PREAPREd and
> not-PREPAREDd at the level of FDW.
>

In case of Prepared statements, ExecInit is called at the end of planning,
without subsequent execution like the case of EXPLAIN. I see that the patch
handles EXPLAIN well, but I didn't see any specific code for PREPARE.

>
> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>

--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-13 11:46:46
Message-ID:	20150113.204646.205937470.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello. This is a version 3 patch.

- PgFdwConnState is removed

- PgFdwConn is isolated as a separate module.

- State transition was simplicated, I think.

- Comment about multiple scans on a connection is added.

- The issue of PREPARE is not addressed yet.

- It is to show how the new style looks, so it is lacking for
comments for every PgFdwConn functions.

- Rebased to current master.

=======
> > I added a comment describing the and logic and meaning of the
> > statesjust above the enum declaration.
> >
>
> This needs to be clarified further. But that can wait till we finalise the
> approach and the patch. Esp. comment below is confusing
> 1487 + * PGFDW_CONN_SYNC_RUNNING is rather an internal state in
> 1488 + * fetch_more_data(). It indicates that the function shouldn't send
> the next
> 1489 + * fetch requst after getting the result.
>
> I couldn't get the meaning of the second sentence, esp. it's connection
> with synchronous-ness.

In this version, I removed PgFdwConnState. Now what indicates
that async fetch is running or not is the existence of
async_fetch. I think the complicated state transition is
dissapeard.

> > > The if condition if (entry->conn == NULL) in GetConnection(), used to
> > track
> > > whether there is a PGConn active for the given entry, now it tracks
> > whether
> > > it has PgFdwConn for the same.
> >
> > After some soncideration, I decided to make PgFdwConn to be
> > handled more similarly to PGconn. This patch has shrunk as a
> > result and bacame looks clear.
> >
>
> I think it's still prone to segfaults considering two pointer indirections.

PGconn itself already makes two-level indirection, and PgFdwConn
has hidden its details mainly using macros. I may misunderstood
you, but if you're worried that PgFdwConn.pgconn can be set from
anywhere, we would should separate PgFdwConn into another
C-module and hide all the details as PGconn does. It is shown as
the separte patch. But I feel it a bit overdone because it is not
an end-user interface.

> > I wrote the outline of the logic in the comment for enum
> > PgFdwConnState in postgres_fdw.h. Is it make sense?
> >
>
> The point about two different ForeignScan nodes using the same connection
> needs some clarification, I guess. It's not very clear, why would there be
> more queries run on the same connection. I know why this happens, but it's
> important to mention it somewhere. If it's already mentioned somewhere in
> the file, sorry for not paying attention to that.

Yeah. It is just what I stumbled on. I changed the comment in
fetch_more_data() like below. Does it make sense?

| /*
| * On the current postgres_fdw implement, multiple PgFdwScanState
| * on the same foreign server and mapped user share the same
| * connection to the remote server (see GetConnection() in
| * connection.c) and inidividual scans on it are separated using
| * cursors. Since one connection cannot accept two or more
| * asynchronous queries simultaneously, we should stop the async
| * fetching if the another scan comes.
| */
|
| if (PFCgetNscans(conn) > 1)
| PFCsetAsyncScan(conn, NULL);

> I think there is more chance that there will more than one ForeignScan
> nodes trying interact with a foreign server, even after the push-down work.
> The current solution doesn't address that. We actually need parallel
> querying in two cases
> 1. Querying multiple servers in parallel
> 2. Querying same server (by two querists) in parallel within the same query
> e.g. an un-pushable join.
>
> We need a solution which is work in both the cases.

The first point is realized by this patch with some
limitations. The second point is that my old patch did, it made a
dedicated connection for individual scans up to some fixed number
aside the main connection, then the overflowed scans go to the
main connection and they are done in the manner the unpatched
postgres_fdw does.

I was thinking that the 'some fiexed number' could be set by a
parameter of a foreign server but I got a comment that it could
fill up the remote server unless reasonable load or/and bandwidth
managemant. So I abandoned the multiple-connection solution and
decided to do everything on the first connection. It's how the
current patch came.

> Is it possible to use the parallel query infrastructure being built by
> Robert or to do something like parallel seq scan? That will work, not just
> for Postgres FDW but all the FDWs.

I haven't seen closer to the patch but if my understanding by a
glance is correct, the parallel scan devides the target table
into multple parts then runs subscans on every part in parallel.

It might allocate dedicated processes for every child scan on a
partitioned table.

But, I think, from the performance view, every scan of multiple
foreign scans don't need correnponding local process. But if the
parallel scan infrastructure makes the mechanism simpler, using
it is a very promising choice.

> In case of Prepared statements, ExecInit is called at the end of planning,
> without subsequent execution like the case of EXPLAIN. I see that the patch
> handles EXPLAIN well, but I didn't see any specific code for PREPARE.

I'll look into the case after this, but I'd like to send a
revised patch at this point.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Asynchronous-execution-of-postgres_fdw-v3.patch	text/x-patch	35.4 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-14 08:24:50
Message-ID:	20150114.172450.102338109.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

> - The issue of PREPARE is not addressed yet.
...

> > In case of Prepared statements, ExecInit is called at the end of planning,
> > without subsequent execution like the case of EXPLAIN. I see that the patch
> > handles EXPLAIN well, but I didn't see any specific code for PREPARE.
>
> I'll look into the case after this, but I'd like to send a
> revised patch at this point.

Mmm.. CreateExecutorState() looks to be called when calculating
the expression in predicates, clauses, or EXECUTE parameters. All
of these complete SQL execution if any. And I couldn't make the
situation you mentioned.

Could you give me an example or illustration about such a
situation where ExecInit alone is called without
IterateForeignScan?

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-14 21:45:31
Message-ID:	CA+TgmobL3c_4kK8k1SrZAffcTRYnQ3Jd7JhD0cpPP6yGiLsT2w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 13, 2015 at 6:46 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> Is it possible to use the parallel query infrastructure being built by
>> Robert or to do something like parallel seq scan? That will work, not just
>> for Postgres FDW but all the FDWs.
>
> But, I think, from the performance view, every scan of multiple
> foreign scans don't need correnponding local process.

Quite so. I think this is largely a separate project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-15 07:49:09
Message-ID:	CAB7nPqSqu03EMdAfiZpSGqyYWWvPyhFLrjGTHrHXK4G5sHKT8g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 13, 2015 at 8:46 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I'll look into the case after this, but I'd like to send a
> revised patch at this point.
Hm. Seems like this patch is not completely baked yet. Horiguchi-san,
as you are obviously still working on it, would you agree to move it
to the next CF?
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-15 07:59:46
Message-ID:	20150115.165946.256597820.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

> > I'll look into the case after this, but I'd like to send a
> > revised patch at this point.
> Hm. Seems like this patch is not completely baked yet. Horiguchi-san,
> as you are obviously still working on it, would you agree to move it
> to the next CF?

Yes, that's fine with me. Thank you.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-16 08:18:49
Message-ID:	20150116.171849.109146500.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I revised the patch so that async scan will be done more
aggressively, and took execution time for two very simple cases.

As the result, simple seq scan gained 5% and hash join of two
foreign tables gained 150%. (2.4 times faster).

While measuring the performance, I noticed that each scan in a
query runs at once rather than alternating with each other in
many cases such as hash join or sorted joins and so. So I
modified the patch so that async fetch is done more
aggressively. The new v4 patch is attached. The following numbers
are taken based on it.

========
Simple seq scan for the first test.

> CREATE TABLE lt1 (a int, b timestamp, c text);
> CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost');
> CREATE USER MAPPING FOR PUBLIC SERVER sv1;
> CREATE FOREIGN TABLE ft1 () SERVER sv1 OPTIONS (table_name 'lt1');
> INSERT INTO lt1 (SELECT a, now(), repeat('x', 128) FROM generate_series(0, 999999) a);

On this case, I took the the 10 times average of exec time of the
following query for both master head and patched version. The
fetch size is 100.

> postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
> QUERY PLAN
> ------------------------------------------------------------------
> Foreign Scan on ft1 (actual time=0.79 5..4175.706 rows=1000000 loops=1)
> Planning time: 0.060 ms
> Execution time: 4276.043 ms

master head : avg = 4256.621, std dev = 17.099
patched pgfdw: avg = 4036.463, std dev = 2.608

The patched version is faster by about 5%. This should be pure
result of asynchronous fetching, not including the effect of
early starting of remote execution in ExecInit.

Interestingly, as fetch_count gets larger, the gain raises in
spite of the decrease of the number of query sending.

master head : avg = 2622.759, std dev = 38.379
patched pgfdw: avg = 2277.622, std dev = 27.269

About 15% gain. And for 10000,

master head : avg = 2000.980, std dev = 6.434
patched pgfdw: avg = 1616.793, std dev = 13.192

19%.. It is natural that exec time reduces along with increase of
fetch size, but I haven't found the reason why the patch's gain
also increases.

======================

The second case is a simple join of two foreign tables sharing
one connection.

The master head runs this query in about 16 seconds with almost
no fluctuation among multiple tries.

> =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
> QUERY PLAN
> ----------------------------------------------------------------------------
> Hash Join (actual time=7541.831..15924.631 rows=1000000 loops=1)
> Hash Cond: (x.a = y.a)
> -> Foreign Scan on ft1 x (actual time=1.176..6553.480 rows=1000000 loops=1)
> -> Hash (actual time=7539.761..7539.761 rows=1000000 loops=1)
> Buckets: 32768 Batches: 64 Memory Usage: 2829kB
> -> Foreign Scan on ft1 y (actual time=1.067..6529.165 rows=1000000 loops=1)
> Planning time: 0.223 ms
> Execution time: 15973.916 ms

But the v4 patch mysteriously accelerates this query, 6.5 seconds.

> =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
> QUERY PLAN
> ----------------------------------------------------------------------------
> Hash Join (actual time=2556.977..5812.937 rows=1000000 loops=1)
> Hash Cond: (x.a = y.a)
> -> Foreign Scan on ft1 x (actual time=32.689..1936.565 rows=1000000 loops=1)
> -> Hash (actual time=2523.810..2523.810 rows=1000000 loops=1)
> Buckets: 32768 Batches: 64 Memory Usage: 2829kB
> -> Foreign Scan on ft1 y (actual time=50.345..1928.411 rows=1000000 loops=1)
> Planning time: 0.220 ms
> Execution time: 6512.043 ms

The result data seems not broken. I don't know the reason yet but
I'll investigate it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Asynchronous-execution-of-postgres_fdw-v4.patch	text/x-patch	37.5 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-19 06:24:16
Message-ID:	20150119.152416.57998318.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, that's a silly mistake. fetch_seize = 10000 in the v4
patch. This v5 patch is fixed at the point.

> But the v4 patch mysteriously accelerates this query, 6.5 seconds.
>
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
...
> > Execution time: 6512.043 ms

fetch_size was 10000 at this run. I got about 13.0 seconds for
fetch_size = 100, which is about 19% faster than the original.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

=======
15 17:18:49 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20150116(dot)171849(dot)109146500(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> I revised the patch so that async scan will be done more
> aggressively, and took execution time for two very simple cases.
>
> As the result, simple seq scan gained 5% and hash join of two
> foreign tables gained 150%. (2.4 times faster).
>
> While measuring the performance, I noticed that each scan in a
> query runs at once rather than alternating with each other in
> many cases such as hash join or sorted joins and so. So I
> modified the patch so that async fetch is done more
> aggressively. The new v4 patch is attached. The following numbers
> are taken based on it.
>
> ========
> Simple seq scan for the first test.
>
> > CREATE TABLE lt1 (a int, b timestamp, c text);
> > CREATE SERVER sv1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'localhost');
> > CREATE USER MAPPING FOR PUBLIC SERVER sv1;
> > CREATE FOREIGN TABLE ft1 () SERVER sv1 OPTIONS (table_name 'lt1');
> > INSERT INTO lt1 (SELECT a, now(), repeat('x', 128) FROM generate_series(0, 999999) a);
>
> On this case, I took the the 10 times average of exec time of the
> following query for both master head and patched version. The
> fetch size is 100.
>
> > postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
> > QUERY PLAN
> > ------------------------------------------------------------------
> > Foreign Scan on ft1 (actual time=0.79 5..4175.706 rows=1000000 loops=1)
> > Planning time: 0.060 ms
> > Execution time: 4276.043 ms
>
> master head : avg = 4256.621, std dev = 17.099
> patched pgfdw: avg = 4036.463, std dev = 2.608
>
> The patched version is faster by about 5%. This should be pure
> result of asynchronous fetching, not including the effect of
> early starting of remote execution in ExecInit.
>
> Interestingly, as fetch_count gets larger, the gain raises in
> spite of the decrease of the number of query sending.
>
> master head : avg = 2622.759, std dev = 38.379
> patched pgfdw: avg = 2277.622, std dev = 27.269
>
> About 15% gain. And for 10000,
>
> master head : avg = 2000.980, std dev = 6.434
> patched pgfdw: avg = 1616.793, std dev = 13.192
>
> 19%.. It is natural that exec time reduces along with increase of
> fetch size, but I haven't found the reason why the patch's gain
> also increases.
>
> ======================
>
> The second case is a simple join of two foreign tables sharing
> one connection.
>
> The master head runs this query in about 16 seconds with almost
> no fluctuation among multiple tries.
>
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
> > QUERY PLAN
> > ----------------------------------------------------------------------------
> > Hash Join (actual time=7541.831..15924.631 rows=1000000 loops=1)
> > Hash Cond: (x.a = y.a)
> > -> Foreign Scan on ft1 x (actual time=1.176..6553.480 rows=1000000 loops=1)
> > -> Hash (actual time=7539.761..7539.761 rows=1000000 loops=1)
> > Buckets: 32768 Batches: 64 Memory Usage: 2829kB
> > -> Foreign Scan on ft1 y (actual time=1.067..6529.165 rows=1000000 loops=1)
> > Planning time: 0.223 ms
> > Execution time: 15973.916 ms
>
> But the v4 patch mysteriously accelerates this query, 6.5 seconds.
>
> > =# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c
> > FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
> > QUERY PLAN
> > ----------------------------------------------------------------------------
> > Hash Join (actual time=2556.977..5812.937 rows=1000000 loops=1)
> > Hash Cond: (x.a = y.a)
> > -> Foreign Scan on ft1 x (actual time=32.689..1936.565 rows=1000000 loops=1)
> > -> Hash (actual time=2523.810..2523.810 rows=1000000 loops=1)
> > Buckets: 32768 Batches: 64 Memory Usage: 2829kB
> > -> Foreign Scan on ft1 y (actual time=50.345..1928.411 rows=1000000 loops=1)
> > Planning time: 0.220 ms
> > Execution time: 6512.043 ms
>
> The result data seems not broken. I don't know the reason yet but
> I'll investigate it.

Attachment	Content-Type	Size
0001-Asynchronous-execution-of-postgres_fdw-v5.patch	text/x-patch	37.5 KB

From:	Matt Kelly <mkellycs(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, ashutosh(dot)bapat(at)enterprisedb(dot)com
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-20 04:51:13
Message-ID:	CA+KcUkhLUo+Vaj4xR8GVsof_nW79uDZTDYhOSdt13CFJkaEEdQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I think its telling that varying the fetch size doubled the performance,
even on localhost. If you were to repeat this test across a network, the
performance difference would be far more drastic.

I understand the desire to keep the fetch size small by default, but I
think your results demonstrate how important the value is. At the very
least, it is worth reconsidering this "arbitrary" value. However, I think
the real solution is to make this configurable. It probably should be a
new option on the foreign server or table, but an argument could be made
for it to be global across the server just like work_mem.

Obviously, this shouldn't block your current patch but its worth revisiting.

- Matt Kelly

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	mkellycs(at)gmail(dot)com
Cc:	ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-20 08:47:13
Message-ID:	20150120.174713.187643362.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, thank you for the comment. I added experimental adaptive
fetch size feature in this v6 patch.

At Tue, 20 Jan 2015 04:51:13 +0000, Matt Kelly <mkellycs(at)gmail(dot)com> wrote in <CA+KcUkhLUo+Vaj4xR8GVsof_nW79uDZTDYhOSdt13CFJkaEEdQ(at)mail(dot)gmail(dot)com>
> I think its telling that varying the fetch size doubled the performance,
> even on localhost. If you were to repeat this test across a network, the
> performance difference would be far more drastic.

I think so surely.

> I understand the desire to keep the fetch size small by default, but I
> think your results demonstrate how important the value is. At the very
> least, it is worth reconsidering this "arbitrary" value. However, I think
> the real solution is to make this configurable. It probably should be a
> new option on the foreign server or table, but an argument could be made
> for it to be global across the server just like work_mem.

The optimal number of fetch_count varies depending on query. Only
from the performance view, it should be the same as the table
size when simple scan on a table. Most of joins also not need to
read target relations simultaneously. (Local merge join on remote
sorted results is not available since fdw is not aware of the
sorted-ness). But it would be changed in near future. So I have
found no appropriate policy to decide the number.

The another point of view is memory requirement. This wouldn't
matter using single-row mode of libpq but it doesn't allow
multple simultaneous queries. The space needed for the fetch
buffer widely varies in proportion to the average row length. If
it is 1Kbytes, 10000 rows requires over 10MByes, which is larger
than the default value of work_mem. I tried adaptive fetch_size
based on fetch durtaion and required buffer size for the previous
turn in this version. But hard limit cannot be imposed since we
cannot know of the mean row length in advance. So, for example,
the average row length suddenly grows 1KB->10KB when fetch_size
is 10000, 100MB is required for the turn. I think, for the
ordinary cases, maximum fetch size cannot exceeds 1000.

The attatched is the new version implemented the adaptive fetch
size. Simple test runs showed the values below. A single scan was
boosted by about 5% (No effect?) and a join by 33%. The former
case is ununderstandable so I'll examine it tomorrow. This
doesn't seem so promising, though..

=====
master=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
QUERY PLAN
-------------------------------------------------------------------------
Foreign Scan on ft1 (actual time=1.741..10046.272 rows=1000000 loops=1)
Planning time: 0.084 ms
Execution time: 10145.730 ms
(3 rows)

patched=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT * FROM ft1;
QUERY PLAN
------------------------------------------------------------------------
Foreign Scan on ft1 (actual time=1.072..9582.980 rows=1000000 loops=1)
Planning time: 0.077 ms
Execution time: 9683.164 ms
(3 rows)

patched=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
QUERY PLAN

================================
postgres=# EXPLAIN (ANALYZE ON, COSTS OFF) SELECT x.a, x.c, y.c FROM ft1 AS x JOIN ft1 AS y on x.a = y.a;
QUERY PLAN

-------------------------------------------------------------------------------
-------
Merge Join (actual time=18191.739..19534.001 rows=1000000 loops=1)
Merge Cond: (x.a = y.a)
-> Sort (actual time=9031.155..9294.465 rows=1000000 loops=1)
Sort Key: x.a
Sort Method: external sort Disk: 142728kB
-> Foreign Scan on ft1 x (actual time=1.156..6486.632 rows=1000000 lo
ops=1)
-> Sort (actual time=9160.577..9479.076 rows=1000000 loops=1)
Sort Key: y.a
Sort Method: external sort Disk: 146632kB
-> Foreign Scan on ft1 y (actual time=0.641..6517.594 rows=1000000 lo
ops=1)
Planning time: 0.203 ms
Execution time: 19626.881 ms
(12 rows)

-------------------------------------------------------------------------------
-------
Merge Join (actual time=11790.690..13134.071 rows=1000000 loops=1)
Merge Cond: (x.a = y.a)
-> Sort (actual time=8149.225..8413.611 rows=1000000 loops=1)
Sort Key: x.a
Sort Method: external sort Disk: 142728kB
-> Foreign Scan on ft1 x (actual time=0.679..3989.160 rows=1000000 lo
ops=1)
-> Sort (actual time=3641.457..3957.240 rows=1000000 loops=1)
Sort Key: y.a
Sort Method: external sort Disk: 146632kB
-> Foreign Scan on ft1 y (actual time=0.605..1852.655 rows=1000000 lo
ops=1)
Planning time: 0.203 ms
Execution time: 13226.414 ms
(12 rows)

> Obviously, this shouldn't block your current patch but its worth revisiting.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Asynchronous-execution-of-postgres_fdw-v6.patch	text/x-patch	41.7 KB

From:	Matt Kelly <mkellycs(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-21 05:22:34
Message-ID:	CA+KcUkg4cvDLf4v0M9_rVv_ZuAsG1oDHPj_YvczJa6w2nSkwNQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I'm trying to compare v5 and v6 in my laptop right now. Apparently my
laptop is quite a bit faster than your machine because the tests complete
in roughly 3.3 seconds.

I added more data and didn't see anything other than noise. (Then again
the queries were dominated by the disk sort so I should retry with larger
work_mem). I'll try it again when I have more time to play with it. I
suspect the benefits would be more clear over a network.

Larger than default work_mem yes, but I think one of the prime use case for
the fdw is for more warehouse style situations (PostgresXL style use
cases). In those cases, work_mem might reasonably be set to 1GB. Then
even if you have 10KB rows you can fetch a million rows and still be using
less than work_mem. A simpler change would be to vary it with respect to
work_mem.

Half baked idea: I know its the wrong time in the execution phase, but if
you are using remote estimates for cost there should also be a row width
estimate which I believe is based from pg_statistic and its mean column
width.

Its actually a pity that there is no way to set fetch sizes based on "give
me as many tuples as will fit in less than x amount of memory". Because
that is almost always exactly what you want. Even when writing application
code, I've never actually wanted precisely 10,000 rows; I've always wanted
"a reasonable size chunk that could fit into memory" and then backed my way
into how many rows I wanted. If we were to extend FETCH to support syntax
like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need
estimate the value on the fly.

The async stuff, however, is a huge improvement over the last time I played
with the fdw. The two active postgres processes were easily consuming a
core and half of CPU. I think its not worth tying these two things
together. Its probably worth it to make these two separate discussions and
separate patches.

- Matt Kelly

*Just sanity checking myself: Shutting down the server, applying the
different patch, 'make clean install' in postgres_fdw, and then restarting
the server should obviously be sufficient to make sure its running the new
code because that is all linked at runtime, right?

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	mkellycs(at)gmail(dot)com
Cc:	ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-01-21 07:24:48
Message-ID:	20150121.162448.23571876.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, thank you for looking this but sorry that the last patch
was buggy so that adaptive fetch size did not work.

The attached is the fixed patch. It apparently improves the
performance for the test case shown in the previous mail, in
which the average tuple length is about 140 bytes.

21 Jan 2015 05:22:34 +0000, Matt Kelly <mkellycs(at)gmail(dot)com> wrote in <CA+KcUkg4cvDLf4v0M9_rVv_ZuAsG1oDHPj_YvczJa6w2nSkwNQ(at)mail(dot)gmail(dot)com>
> I'm trying to compare v5 and v6 in my laptop right now. Apparently my
> laptop is quite a bit faster than your machine because the tests complete
> in roughly 3.3 seconds.
>
> I added more data and didn't see anything other than noise. (Then again
> the queries were dominated by the disk sort so I should retry with larger
> work_mem). I'll try it again when I have more time to play with it. I
> suspect the benefits would be more clear over a network.
>
> Larger than default work_mem yes, but I think one of the prime use case for
> the fdw is for more warehouse style situations (PostgresXL style use
> cases). In those cases, work_mem might reasonably be set to 1GB. Then
> even if you have 10KB rows you can fetch a million rows and still be using
> less than work_mem. A simpler change would be to vary it with respect to
> work_mem.

Agreed about the nature of the typical workload for postgres
FDW. But I think server itself including postgres_fdw should not
crash even by a sudden explosion of tuple length. The number 100
seems to be safe enough but 1000 seems suspicious, and 10000 is
looks to be danger from such standpoint.

> Half baked idea: I know its the wrong time in the execution phase, but if
> you are using remote estimates for cost there should also be a row width
> estimate which I believe is based from pg_statistic and its mean column
> width.

It reduces the chance to claim unexpected amount of memory, but
still the chance remains.

> Its actually a pity that there is no way to set fetch sizes based on "give
> me as many tuples as will fit in less than x amount of memory". Because
> that is almost always exactly what you want. Even when writing application
> code, I've never actually wanted precisely 10,000 rows; I've always wanted
> "a reasonable size chunk that could fit into memory" and then backed my way
> into how many rows I wanted. If we were to extend FETCH to support syntax
> like: FETCH FORWARD '10MB' FROM ...; then we would eliminate the need
> estimate the value on the fly.

I didn't think about hat. It makes sense, at least to me:) There
would be many cases that the *amount* of data is more crucial
than their number. I'll work on it.

> The async stuff, however, is a huge improvement over the last time I played
> with the fdw. The two active postgres processes were easily consuming a
> core and half of CPU. I think its not worth tying these two things
> together. Its probably worth it to make these two separate discussions and
> separate patches.

Yes, they can be separated and also should be. I'll split them
after this.

> - Matt Kelly
>
> *Just sanity checking myself: Shutting down the server, applying the
> different patch, 'make clean install' in postgres_fdw, and then restarting
> the server should obviously be sufficient to make sure its running the new
> code because that is all linked at runtime, right?

Yes. it's enough and I also did so. This patch touches only
postgres_fdw.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Asynchronous-execution-of-postgres_fdw-v7.patch	text/x-patch	41.9 KB

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-05-09 01:24:41
Message-ID:	20150509012440.GF30322@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kyotaro,

* Kyotaro HORIGUCHI (horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp) wrote:
> The attached is the fixed patch. It apparently improves the
> performance for the test case shown in the previous mail, in
> which the average tuple length is about 140 bytes.

I'm all for improving performance of postgres_fdw and would like to see
us support sending queries off to be worked asyncronously, but starting
execution on the remote server during ExecInitNode is against the
documentated FDW API spec. I discussed exactly this issue over a year
ago here:

http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net

Sadly, there weren't any direct responses to that email, but I do recall
having a discussion on another thread (or in person?) with Tom where we
ended up agreeing that we can't simply remove that requirement from the
docs or the API.

I certainly appreciate that you've put quite a bit of effort into this
but I'm afraid we can't accept it while it's starting to run a query on
the remote side during the ExecInitNode phase. The query can not start
executing on the remote side until InterateForeignScan is called.

You might consider looking at the other suggestion in that email with
regard to adding an Async mechanism to the executor. I didn't get to
the point of writing code, but I did think about it a fair bit and still
believe that could work.

I'm not going to change the status of this patch in the CommitFest at
this time, in case anyone else feels I've misunderstood or not correctly
analyzed what the patch does (I'll admit, I've only read it and not
actually compiled it or run it with a debugger, but I'm pretty sure my
reading of what's happening is correct..), but I'm afraid this is going
to have to end up as Rejected.

Thanks!

Stephen

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-05-09 04:17:53
Message-ID:	1608.1431145073@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> I'm all for improving performance of postgres_fdw and would like to see
> us support sending queries off to be worked asyncronously, but starting
> execution on the remote server during ExecInitNode is against the
> documentated FDW API spec. I discussed exactly this issue over a year
> ago here:

> http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net

> Sadly, there weren't any direct responses to that email, but I do recall
> having a discussion on another thread (or in person?) with Tom where we
> ended up agreeing that we can't simply remove that requirement from the
> docs or the API.

Yeah. There are at least a couple of reasons why not:

* ExecInitNode only creates the runtime data structures, it should not
begin execution. It's possible for example that the scan will never be
iterated at all; say it's on the inside of a nestloop and the outer
relation turns out to be empty. It's not apparent why starting the remote
query a few microseconds sooner is worth the risk of demanding useless
computation.

* If the scan is parameterized (again, it's on the inside of a nestloop,
and the outer relation is supplying join key values), those parameter
values are simply not available at ExecInitNode time.

Also, so far as a quick review of the actual patch goes, I would really
like to see this lose the "PFC" wrapper layer, which accounts for 95% of
the code churn in the patch and doesn't seem to add any actual value.
What it does add is unchecked malloc failure conditions.

regards, tom lane

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-05-09 11:07:59
Message-ID:	20150509110759.GG30322@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom,

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > I'm all for improving performance of postgres_fdw and would like to see
> > us support sending queries off to be worked asyncronously, but starting
> > execution on the remote server during ExecInitNode is against the
> > documentated FDW API spec. I discussed exactly this issue over a year
> > ago here:
>
> > http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net
>
> > Sadly, there weren't any direct responses to that email, but I do recall
> > having a discussion on another thread (or in person?) with Tom where we
> > ended up agreeing that we can't simply remove that requirement from the
> > docs or the API.
>
> Yeah. There are at least a couple of reasons why not:

Thanks for the reminders of those.

> Also, so far as a quick review of the actual patch goes, I would really
> like to see this lose the "PFC" wrapper layer, which accounts for 95% of
> the code churn in the patch and doesn't seem to add any actual value.
> What it does add is unchecked malloc failure conditions.

Agreed, the wrapper isn't doing anything particularly useful; I had
started out thinking that would be my first comment until it became
clear where all the performance improvement was coming from.

I've gone ahead and marked this as Rejected. The concept of async
execution of postgres_fdw is certainly still open and a worthwhile goal,
but this implementation isn't the way to achieve that.

Thanks!

Stephen

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	sfrost(at)snowman(dot)net
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, mkellycs(at)gmail(dot)com, ashutosh(dot)bapat(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Async execution of postgres_fdw.
Date:	2015-05-11 02:37:19
Message-ID:	20150511.113719.84171254.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

> I've gone ahead and marked this as Rejected. The concept of async
> execution of postgres_fdw is certainly still open and a worthwhile goal,
> but this implementation isn't the way to achieve that.

It sounds fair. I'm satisfied that we have agreed that the goal
is worthwhile. I'll show other implementations sooner.

Thank you.

> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > Stephen Frost <sfrost(at)snowman(dot)net> writes:
> > > I'm all for improving performance of postgres_fdw and would like to see
> > > us support sending queries off to be worked asyncronously, but starting
> > > execution on the remote server during ExecInitNode is against the
> > > documentated FDW API spec. I discussed exactly this issue over a year
> > > ago here:
> >
> > > http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net
> >
> > > Sadly, there weren't any direct responses to that email, but I do recall
> > > having a discussion on another thread (or in person?) with Tom where we
> > > ended up agreeing that we can't simply remove that requirement from the
> > > docs or the API.
> >
> > Yeah. There are at least a couple of reasons why not:
>
> Thanks for the reminders of those.
>
> > Also, so far as a quick review of the actual patch goes, I would really
> > like to see this lose the "PFC" wrapper layer, which accounts for 95% of
> > the code churn in the patch and doesn't seem to add any actual value.
> > What it does add is unchecked malloc failure conditions.
>
> Agreed, the wrapper isn't doing anything particularly useful; I had
> started out thinking that would be my first comment until it became
> clear where all the performance improvement was coming from.
>
> I've gone ahead and marked this as Rejected. The concept of async
> execution of postgres_fdw is certainly still open and a worthwhile goal,
> but this implementation isn't the way to achieve that.

--
Kyotaro Horiguchi
NTT Open Source Software Center