Quick Links

Re: asynchronous execution

Lists:	pgsql-hackers

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2016-09-24 01:09:03
Message-ID:	CA+TgmoaXQEt4tZ03FtQhnzeDEMzBck+Lrni0UWHVVgOTnA6C1w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

[ Adjusting subject line to reflect the actual topic of discussion better. ]

On Fri, Sep 23, 2016 at 9:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Sep 23, 2016 at 8:45 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> For e.g., in the above plan which you specified, suppose :
>> 1. Hash Join has called ExecProcNode() for the child foreign scan b, and so
>> is
>> waiting in ExecAsyncWaitForNode(foreign_scan_on_b).
>> 2. The event wait list already has foreign scan on a that is on a different
>> subtree.
>> 3. This foreign scan a happens to be ready, so in
>> ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called,
>> which returns with result_ready.
>> 4. Since it returns result_ready, it's parent node is now inserted in the
>> callbacks array, and so it's parent (Append) is executed.
>> 5. But, this Append planstate is already in the middle of executing Hash
>> join, and is waiting for HashJoin.
>
> Ah, yeah, something like that could happen. I've spent much of this
> week working on a new design for this feature which I think will avoid
> this problem. It doesn't work yet - in fact I can't even really test
> it yet. But I'll post what I've got by the end of the day today so
> that anyone who is interested can look at it and critique.

Well, I promised to post this, so here it is. It's not really working
all that well at this point, and it's definitely not doing anything
that interesting, but you can see the outline of what I have in mind.
Since Kyotaro Horiguchi found that my previous design had a
system-wide performance impact due to the ExecProcNode changes, I
decided to take a different approach here: I created an async
infrastructure where both the requestor and the requestee have to be
specifically modified to support parallelism, and then modified Append
and ForeignScan to cooperate using the new interface. Hopefully that
means that anything other than those two nodes will suffer no
performance impact. Of course, it might have other problems....

Some notes:

- EvalPlanQual rechecks are broken.
- EXPLAIN ANALYZE instrumentation is broken.
- ExecReScanAppend is broken, because the async stuff needs some way
of canceling an async request and I didn't invent anything like that
yet.
- The postgres_fdw changes pretend to be async but aren't actually.
It's just a demo of (part of) the interface at this point.
- The postgres_fdw changes also report all pg-fdw paths as
async-capable, but actually the direct-modify ones aren't, so the
regression tests fail.
- Errors in the executor can leak the WaitEventSet. Probably we need
to modify ResourceOwners to be able to own WaitEventSets.
- There are probably other bugs, too.

Whee!

Note that I've tried to solve the re-entrancy problems by (1) putting
all of the event loop's state inside the EState rather than in local
variables and (2) having the function that is called to report arrival
of a result be thoroughly different than the function that is used to
return a tuple to a synchronous caller.

Comments welcome, if you're feeling brave enough to look at anything
this half-baked.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
async-wip-2016-09-23.patch	binary/octet-stream	41.6 KB

From:	Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2016-09-28 04:30:08
Message-ID:	CAJ3gD9fRmEhUoBMnNN8K_QrHZf7m4rmOHTFDj492oeLZff8o=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 24 September 2016 at 06:39, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Since Kyotaro Horiguchi found that my previous design had a
> system-wide performance impact due to the ExecProcNode changes, I
> decided to take a different approach here: I created an async
> infrastructure where both the requestor and the requestee have to be
> specifically modified to support parallelism, and then modified Append
> and ForeignScan to cooperate using the new interface. Hopefully that
> means that anything other than those two nodes will suffer no
> performance impact. Of course, it might have other problems....

I see that the reason why you re-designed the asynchronous execution
implementation is because the earlier implementation showed
performance degradation in local sequential and local parallel scans.
But I checked that the ExecProcNode() changes were not that
significant as to cause the degradation. It will not call
ExecAsyncWaitForNode() unless that node supports asynchronism. Do you
feel there is anywhere else in the implementation that is really
causing this degrade ? That previous implementation has some issues,
but they seemed solvable. We could resolve the plan state recursion
issue by explicitly making sure the same plan state does not get
called again while it is already executing.

Thanks
-Amit Khandekar

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-09-29 05:39:12
Message-ID:	20160929.143912.223203543.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Sorry for delayed response, I'll have enough time from now and
address this.

At Fri, 23 Sep 2016 21:09:03 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoaXQEt4tZ03FtQhnzeDEMzBck+Lrni0UWHVVgOTnA6C1w(at)mail(dot)gmail(dot)com>
> Well, I promised to post this, so here it is. It's not really working
> all that well at this point, and it's definitely not doing anything
> that interesting, but you can see the outline of what I have in mind.
> Since Kyotaro Horiguchi found that my previous design had a
> system-wide performance impact due to the ExecProcNode changes, I
> decided to take a different approach here: I created an async
> infrastructure where both the requestor and the requestee have to be
> specifically modified to support parallelism, and then modified Append
> and ForeignScan to cooperate using the new interface. Hopefully that
> means that anything other than those two nodes will suffer no
> performance impact. Of course, it might have other problems....
>
> Some notes:
>
> - EvalPlanQual rechecks are broken.
> - EXPLAIN ANALYZE instrumentation is broken.
> - ExecReScanAppend is broken, because the async stuff needs some way
> of canceling an async request and I didn't invent anything like that
> yet.
> - The postgres_fdw changes pretend to be async but aren't actually.
> It's just a demo of (part of) the interface at this point.
> - The postgres_fdw changes also report all pg-fdw paths as
> async-capable, but actually the direct-modify ones aren't, so the
> regression tests fail.
> - Errors in the executor can leak the WaitEventSet. Probably we need
> to modify ResourceOwners to be able to own WaitEventSets.
> - There are probably other bugs, too.
>
> Whee!
>
> Note that I've tried to solve the re-entrancy problems by (1) putting
> all of the event loop's state inside the EState rather than in local
> variables and (2) having the function that is called to report arrival
> of a result be thoroughly different than the function that is used to
> return a tuple to a synchronous caller.
>
> Comments welcome, if you're feeling brave enough to look at anything
> this half-baked.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	amitdkhan(dot)pg(at)gmail(dot)com
Cc:	robertmhaas(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-09-29 06:30:39
Message-ID:	20160929.153039.51784306.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, thank you for the comment.

At Wed, 28 Sep 2016 10:00:08 +0530, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote in <CAJ3gD9fRmEhUoBMnNN8K_QrHZf7m4rmOHTFDj492oeLZff8o=w(at)mail(dot)gmail(dot)com>
> On 24 September 2016 at 06:39, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Since Kyotaro Horiguchi found that my previous design had a
> > system-wide performance impact due to the ExecProcNode changes, I
> > decided to take a different approach here: I created an async
> > infrastructure where both the requestor and the requestee have to be
> > specifically modified to support parallelism, and then modified Append
> > and ForeignScan to cooperate using the new interface. Hopefully that
> > means that anything other than those two nodes will suffer no
> > performance impact. Of course, it might have other problems....
>
> I see that the reason why you re-designed the asynchronous execution
> implementation is because the earlier implementation showed
> performance degradation in local sequential and local parallel scans.
> But I checked that the ExecProcNode() changes were not that
> significant as to cause the degradation.

The basic thought is that we don't allow degradation of as small
as around one percent for simple cases in exchange for this
feature (or similar ones).

Very simple case of SeqScan runs through a very short path, on
where prediction failure penalties of CPU by few branches results
in visible impact. I avoided that by using likely/unlikly but
more fundamental measure is preferable.

> It will not call ExecAsyncWaitForNode() unless that node
> supports asynchronism.

That's true, but it takes a certain amount of CPU cycle to decide
call it or not. The small bit of time is the issue in focus now.

> Do you feel there is anywhere else in
> the implementation that is really causing this degrade ? That
> previous implementation has some issues, but they seemed
> solvable. We could resolve the plan state recursion issue by
> explicitly making sure the same plan state does not get called
> again while it is already executing.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-10-03 10:46:32
Message-ID:	20161003.194632.204401048.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the thought.

At Fri, 23 Sep 2016 21:09:03 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoaXQEt4tZ03FtQhnzeDEMzBck+Lrni0UWHVVgOTnA6C1w(at)mail(dot)gmail(dot)com>
> [ Adjusting subject line to reflect the actual topic of discussion better. ]
>
> On Fri, Sep 23, 2016 at 9:29 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Fri, Sep 23, 2016 at 8:45 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> >> For e.g., in the above plan which you specified, suppose :
> >> 1. Hash Join has called ExecProcNode() for the child foreign scan b, and so
> >> is
> >> waiting in ExecAsyncWaitForNode(foreign_scan_on_b).
> >> 2. The event wait list already has foreign scan on a that is on a different
> >> subtree.
> >> 3. This foreign scan a happens to be ready, so in
> >> ExecAsyncWaitForNode (), ExecDispatchNode(foreign_scan_a) is called,
> >> which returns with result_ready.
> >> 4. Since it returns result_ready, it's parent node is now inserted in the
> >> callbacks array, and so it's parent (Append) is executed.
> >> 5. But, this Append planstate is already in the middle of executing Hash
> >> join, and is waiting for HashJoin.
> >
> > Ah, yeah, something like that could happen. I've spent much of this
> > week working on a new design for this feature which I think will avoid
> > this problem. It doesn't work yet - in fact I can't even really test
> > it yet. But I'll post what I've got by the end of the day today so
> > that anyone who is interested can look at it and critique.
>
> Well, I promised to post this, so here it is. It's not really working
> all that well at this point, and it's definitely not doing anything
> that interesting, but you can see the outline of what I have in mind.
> Since Kyotaro Horiguchi found that my previous design had a
> system-wide performance impact due to the ExecProcNode changes, I
> decided to take a different approach here: I created an async
> infrastructure where both the requestor and the requestee have to be
> specifically modified to support parallelism, and then modified Append
> and ForeignScan to cooperate using the new interface. Hopefully that
> means that anything other than those two nodes will suffer no
> performance impact. Of course, it might have other problems....

The previous framework didn't need to distinguish async-capable
and uncapable nodes from the parent node's view. The things in
ExecProcNode was required for the reason. Instead, this new one
removes the ExecProcNode stuff by distinguishing the two kinds of
node in async-aware parents, that is, Append. This no longer
involves async-unaware nodes into the tuple bubbling-up mechanism
so the reentrant problem doesn't seem to occur.

On the other hand, for example, the following plan, regrardless
of its practicality, (there should be more good example..)

(Async-unaware node)
- NestLoop
- Append
- n * ForegnScan
- Append
- n * ForegnScan

If the NestLoop, Append are async-aware, all of the ForeignScans
can run asynchronously with the previous framework. The topmost
NestLoop will be awakened after that firing of any ForenScans
makes a tuple bubbles up to the NestLoop. This is because the
not-need-to-distinguish-aware-or-not nature provided by the
ExecProcNode stuff.

On the other hand, with the new one, in order to do the same
thing, ExecAppend have in turn to behave differently whether the
parent is async or not. To do this will be bothersome but not
with confidence.

I examine this further intensively, especially for performance
degeneration and obstacles to complete this.

> Some notes:
>
> - EvalPlanQual rechecks are broken.
> - EXPLAIN ANALYZE instrumentation is broken.
> - ExecReScanAppend is broken, because the async stuff needs some way
> of canceling an async request and I didn't invent anything like that
> yet.
> - The postgres_fdw changes pretend to be async but aren't actually.
> It's just a demo of (part of) the interface at this point.
> - The postgres_fdw changes also report all pg-fdw paths as
> async-capable, but actually the direct-modify ones aren't, so the
> regression tests fail.
> - Errors in the executor can leak the WaitEventSet. Probably we need
> to modify ResourceOwners to be able to own WaitEventSets.
> - There are probably other bugs, too.
>
> Whee!
>
> Note that I've tried to solve the re-entrancy problems by (1) putting
> all of the event loop's state inside the EState rather than in local
> variables and (2) having the function that is called to report arrival
> of a result be thoroughly different than the function that is used to
> return a tuple to a synchronous caller.
>
> Comments welcome, if you're feeling brave enough to look at anything
> this half-baked.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2016-10-03 21:00:40
Message-ID:	CA+TgmoYMoB4OG1W6KZjgRda1J9=Lo1fXpH0YXjzvSEwU5rqhVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Sep 28, 2016 at 12:30 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 24 September 2016 at 06:39, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Since Kyotaro Horiguchi found that my previous design had a
>> system-wide performance impact due to the ExecProcNode changes, I
>> decided to take a different approach here: I created an async
>> infrastructure where both the requestor and the requestee have to be
>> specifically modified to support parallelism, and then modified Append
>> and ForeignScan to cooperate using the new interface. Hopefully that
>> means that anything other than those two nodes will suffer no
>> performance impact. Of course, it might have other problems....
>
> I see that the reason why you re-designed the asynchronous execution
> implementation is because the earlier implementation showed
> performance degradation in local sequential and local parallel scans.
> But I checked that the ExecProcNode() changes were not that
> significant as to cause the degradation.

I think we need some testing to prove that one way or the other. If
you can do some - say on a plan with multiple nested loop joins with
inner index-scans, which will call ExecProcNode() a lot - that would
be great. I don't think we can just rely on "it doesn't seem like it
should be slower", though - ExecProcNode() is too important a function
for us to guess at what the performance will be.

The thing I'm really worried about with either implementation is what
happens when we start to add asynchronous capability to multiple
nodes. For example, if you imagine a plan like this:

Append
-> Hash Join
-> Foreign Scan
-> Hash
-> Seq Scan
-> Hash Join
-> Foreign Scan
-> Hash
-> Seq Scan

In order for this to run asynchronously, you need not only Append and
Foreign Scan to be async-capable, but also Hash Join. That's true in
either approach. Things are slightly better with the original
approach, but the basic problem is there in both cases. So it seems
we need an approach that will make adding async capability to a node
really cheap, which seems like it might be a problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2016-10-04 11:53:50
Message-ID:	CAJ3gD9f5t_ZP7RKv6r-Am=KPuui0JuLpOEhQfc3O3ODD0T6tmg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 4 October 2016 at 02:30, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Sep 28, 2016 at 12:30 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
>> On 24 September 2016 at 06:39, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Since Kyotaro Horiguchi found that my previous design had a
>>> system-wide performance impact due to the ExecProcNode changes, I
>>> decided to take a different approach here: I created an async
>>> infrastructure where both the requestor and the requestee have to be
>>> specifically modified to support parallelism, and then modified Append
>>> and ForeignScan to cooperate using the new interface. Hopefully that
>>> means that anything other than those two nodes will suffer no
>>> performance impact. Of course, it might have other problems....
>>
>> I see that the reason why you re-designed the asynchronous execution
>> implementation is because the earlier implementation showed
>> performance degradation in local sequential and local parallel scans.
>> But I checked that the ExecProcNode() changes were not that
>> significant as to cause the degradation.
>
> I think we need some testing to prove that one way or the other. If
> you can do some - say on a plan with multiple nested loop joins with
> inner index-scans, which will call ExecProcNode() a lot - that would
> be great. I don't think we can just rely on "it doesn't seem like it
> should be slower"
Agreed. I will come up with some tests.

> , though - ExecProcNode() is too important a function
> for us to guess at what the performance will be.

Also, parent pointers are not required in the new design. Thinking of
parent pointers, now it seems the event won't get bubbled up the tree
with the new design. But still, , I think it's possible to switch over
to the other asynchronous tree when some node in the current subtree
is waiting. But I am not sure, will think more on that.

>
> The thing I'm really worried about with either implementation is what
> happens when we start to add asynchronous capability to multiple
> nodes. For example, if you imagine a plan like this:
>
> Append
> -> Hash Join
> -> Foreign Scan
> -> Hash
> -> Seq Scan
> -> Hash Join
> -> Foreign Scan
> -> Hash
> -> Seq Scan
>
> In order for this to run asynchronously, you need not only Append and
> Foreign Scan to be async-capable, but also Hash Join. That's true in
> either approach. Things are slightly better with the original
> approach, but the basic problem is there in both cases. So it seems
> we need an approach that will make adding async capability to a node
> really cheap, which seems like it might be a problem.

Yes, we might have to deal with this.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2016-10-05 17:16:11
Message-ID:	CA+TgmoZSWnhy=SB3ggQcB6EqKxzbNeNn=EfwARnCS5tyhhBNcw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 4, 2016 at 7:53 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> Also, parent pointers are not required in the new design. Thinking of
> parent pointers, now it seems the event won't get bubbled up the tree
> with the new design. But still, , I think it's possible to switch over
> to the other asynchronous tree when some node in the current subtree
> is waiting. But I am not sure, will think more on that.

The bubbling-up still happens, because each node that made an async
request gets a callback with the final response - and if it is itself
the recipient of an async request, it can use that callback to respond
to that request in turn. This version doesn't bubble up through
non-async-aware nodes, but that might be a good thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-10-18 01:30:51
Message-ID:	20161018.103051.30820907.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, this works but ExecAppend gets a bit degradation.

At Mon, 03 Oct 2016 19:46:32 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20161003(dot)194632(dot)204401048(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > Some notes:
> >
> > - EvalPlanQual rechecks are broken.

This is fixed by adding (restoring) async-cancelation.

> > - EXPLAIN ANALYZE instrumentation is broken.

EXPLAIN ANALYE seems working but async-specific information is
not available yet.

> > - ExecReScanAppend is broken, because the async stuff needs some way
> > of canceling an async request and I didn't invent anything like that
> > yet.

Fixed as EvalPlanQual.

> > - The postgres_fdw changes pretend to be async but aren't actually.
> > It's just a demo of (part of) the interface at this point.

Applied my previous patch with some modification.

> > - The postgres_fdw changes also report all pg-fdw paths as
> > async-capable, but actually the direct-modify ones aren't, so the
> > regression tests fail.

All actions other than scan does vacate_connection() to use a
connection.

> > - Errors in the executor can leak the WaitEventSet. Probably we need
> > to modify ResourceOwners to be able to own WaitEventSets.

WaitEventSet itself is not leaked but epoll-fd should be closed
at failure. This seems doable with TRY-CATCHing in
ExecAsyncEventLoop. (not yet)

> > - There are probably other bugs, too.
> >
> > Whee!
> >
> > Note that I've tried to solve the re-entrancy problems by (1) putting
> > all of the event loop's state inside the EState rather than in local
> > variables and (2) having the function that is called to report arrival
> > of a result be thoroughly different than the function that is used to
> > return a tuple to a synchronous caller.
> >
> > Comments welcome, if you're feeling brave enough to look at anything
> > this half-baked.

This doesn't cause reentry since this no longer bubbles up
tupples through async-unaware nodes. This framework passes tuples
through private channels for requestor and requestees.

Anyway, I amended this and made postgres_fdw async and then
finally all regtests passed with minor modifications. The
attached patches are the following.

0001-robert-s-2nd-framework.patch
The patch Robert shown upthread

0002-Fix-some-bugs.patch
A small patch to fix complation errors of 0001.

0003-Modify-async-execution-infrastructure.patch
Several modifications on the infrastructure. The details are
shown after the measurement below.

0004-Make-postgres_fdw-async-capable.patch
True-async postgres-fdw.

gentblr.sql, testrun.sh, calc.pl
Performance test script suite.

gentblr.sql - creates test tables.
testrun.sh - does single test run and
calc.pl - drives testrunc.sh and summirize its results.

I measured performance and had the following result.

t0 - SELECT sum(a) FROM <local single table>;
pl - SELECT sum(a) FROM <4 local children>;
pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;

The result is written as "time<ms> (std dev <ms>)"

sync
t0: 3820.33 ( 1.88)
pl: 1608.59 ( 12.06)
pf0: 7928.29 ( 46.58)
pf1: 8023.16 ( 26.43)

async
t0: 3806.31 ( 4.49) 0.4% faster (should be error)
pl: 1629.17 ( 0.29) 1.3% slower
pf0: 6447.07 ( 25.19) 18.7% faster
pf1: 1876.80 ( 47.13) 76.6% faster

t0 is not affected since the ExecProcNode stuff has gone.

pl is getting a bit slower. (almost the same to simple seqscan of
the previous patch) This should be a misprediction penalty.

pf0, pf1 are faster as expected.

========

The below is a summary of modifications made by 0002 and 0003 patch.

execAsync.c, execnodes.h:

- Added include "pgstat.h" to use WAIT_EVENT_ASYNC_WAIT.

- Changed the interface of ExecAsyncRequest to return if a tuple is
immediately available or not.

- Made ExecAsyncConfigureWait to return if it registered at least
one waitevent or not. This is used to know the caller
(ExecAsyncEventWait) has a event to wait (for safety).

If two or more postgres_fdw nodes are sharing one connection,
only one of them can be waited at once. It is a
responsibility to the FDW drivers to ensure at least one wait
event to be added but on failure WaitEventSetWait silently
waits forever.

- There were separate areq->callback_pending and
areq->request_complete but they are altering together so they are
replaced with one state variable areq->state. New enum
AsyncRequestState for areq->state in execnodes.h.

nodeAppend.c:

- Return a tuple immediately if ExecAsyncRequest says that a
tuple is available.

- Reduced nest level of for(;;).

nodeForeignscan.[ch], fdwapi.h, execProcnode.c::

- Calling postgresIterateForeignScan can yield tuples in wrong
shape. Call ExecForeignScan instead.

- Changed the interface of AsyncConfigureWait as execAsync.c.

- Added ShutdownForeignScan interface.

createplan.c, ruleutils.c, plannodes.h:

- With the Rebert's change, explain shows somewhat odd plans
where the Output of Append is named after non-parent
child. This does not harm but uneasy. Added index of the
parent in Append.referent to make it reasoable. (But this
looks ugly..). Still children in explain are in different
order from definition. (expected/postgres_fdw.out is edited)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-robert-s-2nd-framework.patch	text/x-patch	42.9 KB
0002-Fix-some-bugs.patch	text/x-patch	16.3 KB
0003-Modify-async-execution-infrastructure.patch	text/x-patch	29.7 KB
0004-Make-postgres_fdw-async-capable.patch	text/x-patch	43.2 KB
unknown_filename	text/plain	3.3 KB
unknown_filename	text/plain	449 bytes
unknown_filename	text/plain	808 bytes

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-10-25 09:21:50
Message-ID:	20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This is the rebased version on the current master(-0004), and
added resowner stuff (0005) and unlikely(0006).

At Tue, 18 Oct 2016 10:30:51 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20161018(dot)103051(dot)30820907(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > - Errors in the executor can leak the WaitEventSet. Probably we need
> > > to modify ResourceOwners to be able to own WaitEventSets.
>
> WaitEventSet itself is not leaked but epoll-fd should be closed
> at failure. This seems doable with TRY-CATCHing in
> ExecAsyncEventLoop. (not yet)

Haha, that's a silly talk. The wait event can continue to live
when timeout and any error can happen on the way after the
that. I added an entry for wait event set to resource owner and
hang ones created in ExecAsyncEventWait to
TopTransactionResourceOwner. Currently WaitLatchOrSocket doesn't
do so not to change the current behavior. WaitEventSet doesn't
have usable identifier for resowner.c so currently I use the
address(pointer value) for the purpose. The patch 0005 does that.

> I measured performance and had the following result.
>
> t0 - SELECT sum(a) FROM <local single table>;
> pl - SELECT sum(a) FROM <4 local children>;
> pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
> pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
>
> The result is written as "time<ms> (std dev <ms>)"
>
> sync
> t0: 3820.33 ( 1.88)
> pl: 1608.59 ( 12.06)
> pf0: 7928.29 ( 46.58)
> pf1: 8023.16 ( 26.43)
>
> async
> t0: 3806.31 ( 4.49) 0.4% faster (should be error)
> pl: 1629.17 ( 0.29) 1.3% slower
> pf0: 6447.07 ( 25.19) 18.7% faster
> pf1: 1876.80 ( 47.13) 76.6% faster
>
> t0 is not affected since the ExecProcNode stuff has gone.
>
> pl is getting a bit slower. (almost the same to simple seqscan of
> the previous patch) This should be a misprediction penalty.

Using likely macro for ExecAppend, and it seems to have shaken
off the degradation.

sync
t0: 3919.49 ( 5.95)
pl: 1637.95 ( 0.75)
pf0: 8304.20 ( 43.94)
pf1: 8222.09 ( 28.20)

async
t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
pl: 1617.20 ( 3.51) 1.26% faster (ditto)
pf0: 6680.95 (478.72) 19.5% faster
pf1: 1886.87 ( 36.25) 77.1% faster

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-robert-s-2nd-framework.patch	text/x-patch	42.9 KB
0002-Fix-some-bugs.patch	text/x-patch	16.3 KB
0003-Modify-async-execution-infrastructure.patch	text/x-patch	29.6 KB
0004-Make-postgres_fdw-async-capable.patch	text/x-patch	43.2 KB
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch	text/x-patch	8.9 KB
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-10-25 10:22:52
Message-ID:	20161025.192252.214527571.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, this is the 7th patch to make instrumentation work.

Explain analyze shows the following result by the previous patch set .

| Aggregate (cost=820.25..820.26 rows=1 width=8) (actual time=4324.676..4324.676
| rows=1 loops=1)
| -> Append (cost=0.00..791.00 rows=11701 width=4) (actual time=0.910..3663.8
|82 rows=4000000 loops=1)
| -> Foreign Scan on ft10 (cost=100.00..197.75 rows=2925 width=4)
| (never executed)
| -> Foreign Scan on ft20 (cost=100.00..197.75 rows=2925 width=4)
| (never executed)
| -> Foreign Scan on ft30 (cost=100.00..197.75 rows=2925 width=4)
| (never executed)
| -> Foreign Scan on ft40 (cost=100.00..197.75 rows=2925 width=4)
| (never executed)
| -> Seq Scan on pf0 (cost=0.00..0.00 rows=1 width=4)
| (actual time=0.004..0.004 rows=0 loops=1)

The current instrument stuff assumes that requested tuple always
returns a tuple or the end of tuple comes. This async framework
has two point of executing underneath nodes. ExecAsyncRequest and
ExecAsyncEventLoop. So I'm not sure if this is appropriate but
anyway it seems to show sane numbers.

| Aggregate (cost=820.25..820.26 rows=1 width=8) (actual time=4571.205..4571.206
| rows=1 loops=1)
| -> Append (cost=0.00..791.00 rows=11701 width=4) (actual time=1.362..3893.1
|14 rows=4000000 loops=1)
| -> Foreign Scan on ft10 (cost=100.00..197.75 rows=2925 width=4)
| (actual time=1.056..770.863 rows=1000000 loops=1)
| -> Foreign Scan on ft20 (cost=100.00..197.75 rows=2925 width=4)
| (actual time=0.461..767.840 rows=1000000 loops=1)
| -> Foreign Scan on ft30 (cost=100.00..197.75 rows=2925 width=4)
| (actual time=0.474..782.547 rows=1000000 loops=1)
| -> Foreign Scan on ft40 (cost=100.00..197.75 rows=2925 width=4)
| (actual time=0.156..765.920 rows=1000000 loops=1)
| -> Seq Scan on pf0 (cost=0.00..0.00 rows=1 width=4) (never executed)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0007-Add-instrumentation-to-async-execution.patch	text/x-patch	2.9 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-10-31 01:39:12
Message-ID:	20161031.103912.217430542.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

I'm not sure this is in a sutable shape for commit fest but I
decided to register this to ride on the bus for 10.0.

> Hi, this is the 7th patch to make instrumentation work.

This a PoC patch of asynchronous execution feature, based on a
executor infrastructure Robert proposed. These patches are
rebased on the current master.

0001-robert-s-2nd-framework.patch

Roberts executor async infrastructure. Async-driver nodes
register its async-capable children and sync and data transfer
are done out of band of ordinary ExecProcNode channel. So async
execution no longer disturbs async-unaware node and slows them
down.

0002-Fix-some-bugs.patch

Some fixes for 0001 to work. This is just to preserve the shape
of 0001 patch.

0003-Modify-async-execution-infrastructure.patch

The original infrastructure doesn't work when multiple foreign
tables is on the same connection. This makes it work.

0004-Make-postgres_fdw-async-capable.patch

Makes postgres_fdw to work asynchronously.

0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch

This addresses a problem pointed by Robers about 0001 patch,
that WaitEventSet used for async execution can leak by errors.

0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch

ExecAppend gets a bit slower by penalties of misprediction of
branches. This fixes it by using unlikely() macro.

0007-Add-instrumentation-to-async-execution.patch

As the description above for 0001, async infrastructure conveys
tuples outside ExecProcNode channel so EXPLAIN ANALYZE requires
special treat to show sane results. This patch tries that.

A result of a performance measurement is in this message.

https://www.postgresql.org/message-id/20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp

| t0 - SELECT sum(a) FROM <local single table>;
| pl - SELECT sum(a) FROM <4 local children>;
| pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
| pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
...
| async
| t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
| pl: 1617.20 ( 3.51) 1.26% faster (ditto)
| pf0: 6680.95 (478.72) 19.5% faster
| pf1: 1886.87 ( 36.25) 77.1% faster

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-robert-s-2nd-framework.patch	text/x-patch	42.9 KB
0002-Fix-some-bugs.patch	text/x-patch	16.3 KB
0003-Modify-async-execution-infrastructure.patch	text/x-patch	29.6 KB
0004-Make-postgres_fdw-async-capable.patch	text/x-patch	43.2 KB
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch	text/x-patch	8.9 KB
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB
0007-Add-instrumentation-to-async-execution.patch	text/x-patch	2.9 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-11-15 11:25:13
Message-ID:	20161115.202513.268072050.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, this is a maintenance post of reased patches.
I added a change of ResourceOwnerData missed in 0005.

At Mon, 31 Oct 2016 10:39:12 +0900 (JST), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20161031(dot)103912(dot)217430542(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> This a PoC patch of asynchronous execution feature, based on a
> executor infrastructure Robert proposed. These patches are
> rebased on the current master.
>
> 0001-robert-s-2nd-framework.patch
>
> Roberts executor async infrastructure. Async-driver nodes
> register its async-capable children and sync and data transfer
> are done out of band of ordinary ExecProcNode channel. So async
> execution no longer disturbs async-unaware node and slows them
> down.
>
> 0002-Fix-some-bugs.patch
>
> Some fixes for 0001 to work. This is just to preserve the shape
> of 0001 patch.
>
> 0003-Modify-async-execution-infrastructure.patch
>
> The original infrastructure doesn't work when multiple foreign
> tables is on the same connection. This makes it work.
>
> 0004-Make-postgres_fdw-async-capable.patch
>
> Makes postgres_fdw to work asynchronously.
>
> 0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch
>
> This addresses a problem pointed by Robers about 0001 patch,
> that WaitEventSet used for async execution can leak by errors.
>
> 0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch
>
> ExecAppend gets a bit slower by penalties of misprediction of
> branches. This fixes it by using unlikely() macro.
>
> 0007-Add-instrumentation-to-async-execution.patch
>
> As the description above for 0001, async infrastructure conveys
> tuples outside ExecProcNode channel so EXPLAIN ANALYZE requires
> special treat to show sane results. This patch tries that.
>
>
> A result of a performance measurement is in this message.
>
> https://www.postgresql.org/message-id/20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp
>
>
> | t0 - SELECT sum(a) FROM <local single table>;
> | pl - SELECT sum(a) FROM <4 local children>;
> | pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
> | pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
> ...
> | async
> | t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
> | pl: 1617.20 ( 3.51) 1.26% faster (ditto)
> | pf0: 6680.95 (478.72) 19.5% faster
> | pf1: 1886.87 ( 36.25) 77.1% faster

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-robert-s-2nd-framework.patch	text/x-patch	42.9 KB
0002-Fix-some-bugs.patch	text/x-patch	16.3 KB
0003-Modify-async-execution-infrastructure.patch	text/x-patch	29.6 KB
0004-Make-postgres_fdw-async-capable.patch	text/x-patch	43.2 KB
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch	text/x-patch	8.9 KB
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB
0007-Add-instrumentation-to-async-execution.patch	text/x-patch	2.9 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-11-28 10:22:47
Message-ID:	20161128.192247.05277061.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

I cannot respond until next Monday, so I move this to the next CF
by myself.

At Tue, 15 Nov 2016 20:25:13 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20161115(dot)202513(dot)268072050(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Hello, this is a maintenance post of reased patches.
> I added a change of ResourceOwnerData missed in 0005.
>
> At Mon, 31 Oct 2016 10:39:12 +0900 (JST), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20161031(dot)103912(dot)217430542(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > This a PoC patch of asynchronous execution feature, based on a
> > executor infrastructure Robert proposed. These patches are
> > rebased on the current master.
> >
> > 0001-robert-s-2nd-framework.patch
> >
> > Roberts executor async infrastructure. Async-driver nodes
> > register its async-capable children and sync and data transfer
> > are done out of band of ordinary ExecProcNode channel. So async
> > execution no longer disturbs async-unaware node and slows them
> > down.
> >
> > 0002-Fix-some-bugs.patch
> >
> > Some fixes for 0001 to work. This is just to preserve the shape
> > of 0001 patch.
> >
> > 0003-Modify-async-execution-infrastructure.patch
> >
> > The original infrastructure doesn't work when multiple foreign
> > tables is on the same connection. This makes it work.
> >
> > 0004-Make-postgres_fdw-async-capable.patch
> >
> > Makes postgres_fdw to work asynchronously.
> >
> > 0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch
> >
> > This addresses a problem pointed by Robers about 0001 patch,
> > that WaitEventSet used for async execution can leak by errors.
> >
> > 0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch
> >
> > ExecAppend gets a bit slower by penalties of misprediction of
> > branches. This fixes it by using unlikely() macro.
> >
> > 0007-Add-instrumentation-to-async-execution.patch
> >
> > As the description above for 0001, async infrastructure conveys
> > tuples outside ExecProcNode channel so EXPLAIN ANALYZE requires
> > special treat to show sane results. This patch tries that.
> >
> >
> > A result of a performance measurement is in this message.
> >
> > https://www.postgresql.org/message-id/20161025.182150.230901487.horiguchi.kyotaro@lab.ntt.co.jp
> >
> >
> > | t0 - SELECT sum(a) FROM <local single table>;
> > | pl - SELECT sum(a) FROM <4 local children>;
> > | pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
> > | pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
> > ...
> > | async
> > | t0: 3885.84 ( 40.20) 0.86% faster (should be error but stable on my env..)
> > | pl: 1617.20 ( 3.51) 1.26% faster (ditto)
> > | pf0: 6680.95 (478.72) 19.5% faster
> > | pf1: 1886.87 ( 36.25) 77.1% faster

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2016-12-21 08:23:58
Message-ID:	20161221.172358.258649462.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This patch conflicts with e13029a (es_query_dsa) so I rebased
this.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-robert-s-2nd-framework.patch	text/x-patch	42.9 KB
0002-Fix-some-bugs.patch	text/x-patch	16.3 KB
0003-Modify-async-execution-infrastructure.patch	text/x-patch	29.6 KB
0004-Make-postgres_fdw-async-capable.patch	text/x-patch	43.2 KB
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch	text/x-patch	8.9 KB
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB
0007-Add-instrumentation-to-async-execution.patch	text/x-patch	2.9 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-01-31 03:45:48
Message-ID:	20170131.124548.85146517.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I noticed that this patch is conflicting with 665d1fa (Logical
replication) so I rebased this. Only executor/Makefile
conflicted.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-robert-s-2nd-framework.patch	text/x-patch	42.9 KB
0002-Fix-some-bugs.patch	text/x-patch	16.3 KB
0003-Modify-async-execution-infrastructure.patch	text/x-patch	29.6 KB
0004-Make-postgres_fdw-async-capable.patch	text/x-patch	43.2 KB
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch	text/x-patch	8.8 KB
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB
0007-Add-instrumentation-to-async-execution.patch	text/x-patch	2.9 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, amitdkhan(dot)pg(at)gmail(dot)com, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2017-02-01 05:11:58
Message-ID:	CAB7nPqS0MhZrzgMVQeFEnnKABcsMnNULd8=O0PG7_h-FUp5aEQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 31, 2017 at 12:45 PM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I noticed that this patch is conflicting with 665d1fa (Logical
> replication) so I rebased this. Only executor/Makefile
> conflicted.

The patches still apply, moved to CF 2017-03. Be aware of that:
$ git diff HEAD~6 --check
contrib/postgres_fdw/postgres_fdw.c:388: indent with spaces.
+ PendingAsyncRequest *areq,
contrib/postgres_fdw/postgres_fdw.c:389: indent with spaces.
+ bool reinit);
src/backend/utils/resowner/resowner.c:1332: new blank line at EOF.
--
Michael

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	michael(dot)paquier(at)gmail(dot)com
Cc:	robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-02-01 05:58:23
Message-ID:	20170201.145823.96342624.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you.

At Wed, 1 Feb 2017 14:11:58 +0900, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote in <CAB7nPqS0MhZrzgMVQeFEnnKABcsMnNULd8=O0PG7_h-FUp5aEQ(at)mail(dot)gmail(dot)com>
> On Tue, Jan 31, 2017 at 12:45 PM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > I noticed that this patch is conflicting with 665d1fa (Logical
> > replication) so I rebased this. Only executor/Makefile
> > conflicted.
>
> The patches still apply, moved to CF 2017-03. Be aware of that:
> $ git diff HEAD~6 --check
> contrib/postgres_fdw/postgres_fdw.c:388: indent with spaces.
> + PendingAsyncRequest *areq,
> contrib/postgres_fdw/postgres_fdw.c:389: indent with spaces.
> + bool reinit);
> src/backend/utils/resowner/resowner.c:1332: new blank line at EOF.

Thank you for letting me know the command. I changed my check
scripts to use them and it seems working fine on both commit and
rebase.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-02-03 10:04:08
Message-ID:	28186.1486116248@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> I noticed that this patch is conflicting with 665d1fa (Logical
> replication) so I rebased this. Only executor/Makefile
> conflicted.

I was lucky enough to see an infinite loop when using this patch, which I
fixed by this change:

diff --git a/src/backend/executor/execAsync.c b/src/backend/executor/execAsync.c
new file mode 100644
index 588ba18..9b87fbd
*** a/src/backend/executor/execAsync.c
--- b/src/backend/executor/execAsync.c
*************** ExecAsyncEventWait(EState *estate, long
*** 364,369 ****
--- 364,370 ----

if ((w->events & WL_LATCH_SET) != 0)
{
+ ResetLatch(MyLatch);
process_latch_set = true;
continue;
}

Actually _almost_ fixed because at some point one of the following

Assert(areq->state == ASYNC_WAITING);

statements fired. I think it was the immediately following one, but I can
imagine the same to happen in the branch

if (process_latch_set)
...

I think the wants_process_latch field of PendingAsyncRequest is not useful
alone because the process latch can be set for reasons completely unrelated to
the asynchronous processing. If the asynchronous node should use latch to
signal it's readiness, I think an additional flag is needed in the request
which tells ExecAsyncEventWait that the latch was set by the asynchronous
node.

BTW, do we really need the ASYNC_CALLBACK_PENDING state? I can imagine the
async node either to change ASYNC_WAITING directly to ASYNC_COMPLETE, or leave
it ASYNC_WAITING if the data is not ready.

In addition, the following comments are based only on code review, I didn't
verify my understanding experimentally:

* Isn't it possible for AppendState.as_asyncresult to contain multiple
responses from the same async node? Since the array stores TupleTableSlot
instead of the actual tuple (so multiple items of as_asyncresult point to
the same slot), I suspect the slot contents might not be defined when the
Append node eventually tries to return it to the upper plan.

* For the WaitEvent subsystem to work, I think postgres_fdw should keep a
separate libpq connection per node, not per user mapping. Currently the
connections are cached by user mapping, but it's legal to locate multiple
child postgres_fdw nodes of Append plan on the same remote server. I expect
that these "co-located" nodes would currently use the same user mapping and
therefore the same connection.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Antonin Houska <ah(at)cybertec(dot)at>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-02-07 00:36:33
Message-ID:	CADkLM=fuvVdKvz92XpCRnb4=rj6bLOhSLifQ3RV=Sb4Q5rJsRA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 3, 2017 at 5:04 AM, Antonin Houska <ah(at)cybertec(dot)at> wrote:

> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> > I noticed that this patch is conflicting with 665d1fa (Logical
> > replication) so I rebased this. Only executor/Makefile
> > conflicted.
>
> I was lucky enough to see an infinite loop when using this patch, which I
> fixed by this change:
>
> diff --git a/src/backend/executor/execAsync.c
> b/src/backend/executor/execAsync.c
> new file mode 100644
> index 588ba18..9b87fbd
> *** a/src/backend/executor/execAsync.c
> --- b/src/backend/executor/execAsync.c
> *************** ExecAsyncEventWait(EState *estate, long
> *** 364,369 ****
> --- 364,370 ----
>
> if ((w->events & WL_LATCH_SET) != 0)
> {
> + ResetLatch(MyLatch);
> process_latch_set = true;
> continue;
> }

Hi, I've been testing this patch because seemed like it would help a use
case of mine, but can't tell if it's currently working for cases other than
a local parent table that has many child partitions which happen to be
foreign tables. Is it? I was hoping to use it for a case like:

select x, sum(y) from one_remote_table
union all
select x, sum(y) from another_remote_table
union all
select x, sum(y) from a_third_remote_table

but while aggregates do appear to be pushed down, it seems that the remote
tables are being queried in sequence. Am I doing something wrong?

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com
Cc:	amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-02-07 04:28:42
Message-ID:	9058d70b-a6b0-8b3c-091a-fe77ed0df580@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Horiguchi-san,

On 2017/01/31 12:45, Kyotaro HORIGUCHI wrote:
> I noticed that this patch is conflicting with 665d1fa (Logical
> replication) so I rebased this. Only executor/Makefile
> conflicted.

With the latest set of patches, I observe a crash due to an Assert failure:

#0 0x0000003969632625 in *__GI_raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x0000003969633e05 in *__GI_abort () at abort.c:92
#2 0x000000000098b22c in ExceptionalCondition (conditionName=0xb30e02
"!(added)", errorType=0xb30d77 "FailedAssertion", fileName=0xb30d50
"execAsync.c",
lineNumber=345) at assert.c:54
#3 0x00000000006883ed in ExecAsyncEventWait (estate=0x13c01b8,
timeout=-1) at execAsync.c:345
#4 0x0000000000687ed5 in ExecAsyncEventLoop (estate=0x13c01b8,
requestor=0x13c1640, timeout=-1) at execAsync.c:186
#5 0x00000000006a5170 in ExecAppend (node=0x13c1640) at nodeAppend.c:257
#6 0x0000000000692b9b in ExecProcNode (node=0x13c1640) at execProcnode.c:411
#7 0x00000000006bf4d7 in ExecResult (node=0x13c1170) at nodeResult.c:113
#8 0x0000000000692b5c in ExecProcNode (node=0x13c1170) at execProcnode.c:399
#9 0x00000000006a596b in fetch_input_tuple (aggstate=0x13c06a0) at
nodeAgg.c:587
#10 0x00000000006a8530 in agg_fill_hash_table (aggstate=0x13c06a0) at
nodeAgg.c:2272
#11 0x00000000006a7e76 in ExecAgg (node=0x13c06a0) at nodeAgg.c:1910
#12 0x0000000000692d69 in ExecProcNode (node=0x13c06a0) at execProcnode.c:514
#13 0x00000000006c1a42 in ExecSort (node=0x13c03d0) at nodeSort.c:103
#14 0x0000000000692d3f in ExecProcNode (node=0x13c03d0) at execProcnode.c:506
#15 0x000000000068e733 in ExecutePlan (estate=0x13c01b8,
planstate=0x13c03d0, use_parallel_mode=0 '\000', operation=CMD_SELECT,
sendTuples=1 '\001',
numberTuples=0, direction=ForwardScanDirection, dest=0x7fa368ee1da8)
at execMain.c:1609
#16 0x000000000068c751 in standard_ExecutorRun (queryDesc=0x135c568,
direction=ForwardScanDirection, count=0) at execMain.c:341
#17 0x000000000068c5dc in ExecutorRun (queryDesc=0x135c568,
<snip>

I was running a query whose plan looked like:

explain (costs off) select tableoid::regclass, a, min(b), max(b) from ptab
group by 1,2 order by 1;
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: ((ptab.tableoid)::regclass)
-> HashAggregate
Group Key: (ptab.tableoid)::regclass, ptab.a
-> Result
-> Append
-> Foreign Scan on ptab_00001
-> Foreign Scan on ptab_00002
-> Foreign Scan on ptab_00003
-> Foreign Scan on ptab_00004
-> Foreign Scan on ptab_00005
-> Foreign Scan on ptab_00006
-> Foreign Scan on ptab_00007
-> Foreign Scan on ptab_00008
-> Foreign Scan on ptab_00009
-> Foreign Scan on ptab_00010
<snip>

The snipped part contains Foreign Scans on 90 more foreign partitions (in
fact, I could see the crash even with 10 foreign table partitions for the
same query).

There is a crash in one more case, which seems related to how WaitEventSet
objects are manipulated during resource-owner-mediated cleanup of a failed
query, such as after the FDW returned an error like below:

ERROR: relation "public.ptab_00010" does not exist
CONTEXT: Remote SQL command: SELECT a, b FROM public.ptab_00010

The backtrace in this looks like below:

Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000009c4c35 in ResourceArrayRemove (resarr=0x7f7f7f7f7f7f80bf,
value=20645152) at resowner.c:301
301 lastidx = resarr->lastidx;
(gdb)
(gdb) bt
#0 0x00000000009c4c35 in ResourceArrayRemove (resarr=0x7f7f7f7f7f7f80bf,
value=20645152) at resowner.c:301
#1 0x00000000009c6578 in ResourceOwnerForgetWES
(owner=0x7f7f7f7f7f7f7f7f, events=0x13b0520) at resowner.c:1317
#2 0x0000000000806098 in FreeWaitEventSet (set=0x13b0520) at latch.c:600
#3 0x00000000009c5338 in ResourceOwnerReleaseInternal (owner=0x12de768,
phase=RESOURCE_RELEASE_BEFORE_LOCKS, isCommit=0 '\000', isTopLevel=1 '\001')
at resowner.c:566
#4 0x00000000009c5155 in ResourceOwnerRelease (owner=0x12de768,
phase=RESOURCE_RELEASE_BEFORE_LOCKS, isCommit=0 '\000', isTopLevel=1
'\001') at resowner.c:485
#5 0x0000000000524172 in AbortTransaction () at xact.c:2588
#6 0x0000000000524854 in AbortCurrentTransaction () at xact.c:3016
#7 0x0000000000836aa6 in PostgresMain (argc=1, argv=0x12d7b08,
dbname=0x12d7968 "postgres", username=0x12d7948 "amit") at postgres.c:3860
#8 0x00000000007a49d8 in BackendRun (port=0x12cdf00) at postmaster.c:4310
#9 0x00000000007a4151 in BackendStartup (port=0x12cdf00) at postmaster.c:3982
#10 0x00000000007a0885 in ServerLoop () at postmaster.c:1722
#11 0x000000000079febf in PostmasterMain (argc=3, argv=0x12aacc0) at
postmaster.c:1330
#12 0x00000000006e7549 in main (argc=3, argv=0x12aacc0) at main.c:228

There is a segfault when accessing the events variable, whose members seem
to be pfreed:

(gdb) f 2
#2 0x0000000000806098 in FreeWaitEventSet (set=0x13b0520) at latch.c:600
600 ResourceOwnerForgetWES(set->resowner, set);
(gdb) p *set
$5 = {
nevents = 2139062143,
nevents_space = 2139062143,
resowner = 0x7f7f7f7f7f7f7f7f,
events = 0x7f7f7f7f7f7f7f7f,
latch = 0x7f7f7f7f7f7f7f7f,
latch_pos = 2139062143,
epoll_fd = 2139062143,
epoll_ret_events = 0x7f7f7f7f7f7f7f7f
}

Thanks,
Amit

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc:	robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-02-16 12:06:00
Message-ID:	20170216.210600.214980879.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you very much for testing this!

At Tue, 7 Feb 2017 13:28:42 +0900, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote in <9058d70b-a6b0-8b3c-091a-fe77ed0df580(at)lab(dot)ntt(dot)co(dot)jp>
> Horiguchi-san,
>
> On 2017/01/31 12:45, Kyotaro HORIGUCHI wrote:
> > I noticed that this patch is conflicting with 665d1fa (Logical
> > replication) so I rebased this. Only executor/Makefile
> > conflicted.
>
> With the latest set of patches, I observe a crash due to an Assert failure:
>
> #3 0x00000000006883ed in ExecAsyncEventWait (estate=0x13c01b8,
> timeout=-1) at execAsync.c:345

This means no pending fdw scan didn't let itself go to waiting
stage. It leads to a stuck of the whole things. This is caused if
no one acutually is waiting for result. I suppose that all of the
foreign scans ran on the same connection. Anyway it should be a
mistake in state transition. I'll look into it.

> I was running a query whose plan looked like:
>
> explain (costs off) select tableoid::regclass, a, min(b), max(b) from ptab
> group by 1,2 order by 1;
> QUERY PLAN
> ------------------------------------------------------
> Sort
> Sort Key: ((ptab.tableoid)::regclass)
> -> HashAggregate
> Group Key: (ptab.tableoid)::regclass, ptab.a
> -> Result
> -> Append
> -> Foreign Scan on ptab_00001
> -> Foreign Scan on ptab_00002
> -> Foreign Scan on ptab_00003
> -> Foreign Scan on ptab_00004
> -> Foreign Scan on ptab_00005
> -> Foreign Scan on ptab_00006
> -> Foreign Scan on ptab_00007
> -> Foreign Scan on ptab_00008
> -> Foreign Scan on ptab_00009
> -> Foreign Scan on ptab_00010
> <snip>
>
> The snipped part contains Foreign Scans on 90 more foreign partitions (in
> fact, I could see the crash even with 10 foreign table partitions for the
> same query).

Yeah, it seems to me unrelated to how many they are.

> There is a crash in one more case, which seems related to how WaitEventSet
> objects are manipulated during resource-owner-mediated cleanup of a failed
> query, such as after the FDW returned an error like below:
>
> ERROR: relation "public.ptab_00010" does not exist
> CONTEXT: Remote SQL command: SELECT a, b FROM public.ptab_00010
>
> The backtrace in this looks like below:
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 0x00000000009c4c35 in ResourceArrayRemove (resarr=0x7f7f7f7f7f7f80bf,
> value=20645152) at resowner.c:301
> 301 lastidx = resarr->lastidx;
> (gdb)
> (gdb) bt
> #0 0x00000000009c4c35 in ResourceArrayRemove (resarr=0x7f7f7f7f7f7f80bf,
> value=20645152) at resowner.c:301
> #1 0x00000000009c6578 in ResourceOwnerForgetWES
> (owner=0x7f7f7f7f7f7f7f7f, events=0x13b0520) at resowner.c:1317
> #2 0x0000000000806098 in FreeWaitEventSet (set=0x13b0520) at latch.c:600
> #3 0x00000000009c5338 in ResourceOwnerReleaseInternal (owner=0x12de768,
> phase=RESOURCE_RELEASE_BEFORE_LOCKS, isCommit=0 '\000', isTopLevel=1 '\001')
> at resowner.c:566
> #4 0x00000000009c5155 in ResourceOwnerRelease (owner=0x12de768,
> phase=RESOURCE_RELEASE_BEFORE_LOCKS, isCommit=0 '\000', isTopLevel=1
> '\001') at resowner.c:485
> #5 0x0000000000524172 in AbortTransaction () at xact.c:2588
> #6 0x0000000000524854 in AbortCurrentTransaction () at xact.c:3016
> #7 0x0000000000836aa6 in PostgresMain (argc=1, argv=0x12d7b08,
> dbname=0x12d7968 "postgres", username=0x12d7948 "amit") at postgres.c:3860
> #8 0x00000000007a49d8 in BackendRun (port=0x12cdf00) at postmaster.c:4310
> #9 0x00000000007a4151 in BackendStartup (port=0x12cdf00) at postmaster.c:3982
> #10 0x00000000007a0885 in ServerLoop () at postmaster.c:1722
> #11 0x000000000079febf in PostmasterMain (argc=3, argv=0x12aacc0) at
> postmaster.c:1330
> #12 0x00000000006e7549 in main (argc=3, argv=0x12aacc0) at main.c:228
>
> There is a segfault when accessing the events variable, whose members seem
> to be pfreed:
>
> (gdb) f 2
> #2 0x0000000000806098 in FreeWaitEventSet (set=0x13b0520) at latch.c:600
> 600 ResourceOwnerForgetWES(set->resowner, set);
> (gdb) p *set
> $5 = {
> nevents = 2139062143,
> nevents_space = 2139062143,
> resowner = 0x7f7f7f7f7f7f7f7f,
> events = 0x7f7f7f7f7f7f7f7f,
> latch = 0x7f7f7f7f7f7f7f7f,
> latch_pos = 2139062143,
> epoll_fd = 2139062143,
> epoll_ret_events = 0x7f7f7f7f7f7f7f7f
> }

Mmm, I reproduces it quite easily. A silly bug.

Something bad is happening between freeing ExecutorState memory
context and resource owner. Perhaps the ExecutorState is freed by
resowner (as a part of its anscestors) before the memory for the
WaitEventSet is freed. It was careless of me. I'll reconsider it.

Great thanks for the report.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc:	robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-02-22 08:39:45
Message-ID:	20170222.173945.262776579.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 16 Feb 2017 21:06:00 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170216(dot)210600(dot)214980879(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > #3 0x00000000006883ed in ExecAsyncEventWait (estate=0x13c01b8,
> > timeout=-1) at execAsync.c:345
>
> This means no pending fdw scan didn't let itself go to waiting
> stage. It leads to a stuck of the whole things. This is caused if
> no one acutually is waiting for result. I suppose that all of the
> foreign scans ran on the same connection. Anyway it should be a
> mistake in state transition. I'll look into it.
...
> > I was running a query whose plan looked like:
> >
> > explain (costs off) select tableoid::regclass, a, min(b), max(b) from ptab
> > group by 1,2 order by 1;
> > QUERY PLAN
> > ------------------------------------------------------
> > Sort
> > Sort Key: ((ptab.tableoid)::regclass)
> > -> HashAggregate
> > Group Key: (ptab.tableoid)::regclass, ptab.a
> > -> Result
> > -> Append
> > -> Foreign Scan on ptab_00001
> > -> Foreign Scan on ptab_00002
> > -> Foreign Scan on ptab_00003
> > -> Foreign Scan on ptab_00004
> > -> Foreign Scan on ptab_00005
> > -> Foreign Scan on ptab_00006
> > -> Foreign Scan on ptab_00007
> > -> Foreign Scan on ptab_00008
> > -> Foreign Scan on ptab_00009
> > -> Foreign Scan on ptab_00010
> > <snip>
> >
> > The snipped part contains Foreign Scans on 90 more foreign partitions (in
> > fact, I could see the crash even with 10 foreign table partitions for the
> > same query).
>
> Yeah, it seems to me unrelated to how many they are.

Finally, I couldn't see the crash for the (maybe) same case. I
can guess two reasons for this. One is that a situation where
node->as_nasyncpending differs from estate->es_num_pending_async,
but I couldn't find a possibility. Another is a situation in
postgresIterateForeignScan where the "next owner" reaches eof but
another waiter is not. I haven't reproduce the situation but
fixed it for the case. Addition to that I found a bug in
ExecAsyncAppendResponse. It calls bms_add_member inappropriate
way.

> Mmm, I reproduces it quite easily. A silly bug.
>
> Something bad is happening between freeing ExecutorState memory
> context and resource owner. Perhaps the ExecutorState is freed by
> resowner (as a part of its anscestors) before the memory for the
> WaitEventSet is freed. It was careless of me. I'll reconsider it.

The cause was that the WaitEventSet was placed in ExecutorState
but registered to TopTransactionResourceOwner. I fixed it.

This fixes are made on top of the previous patches for now. In
the attached files, 0008, 0009 are for the second bug, 0012 is
for the first bug. And 0013 is for bms bug.

Sorry for the confused patches, I will resend more neater ones
soon.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0013-Fix-a-bug-of-a-usage-of-bms_add_member.patch	text/x-patch	1.0 KB
0012-Fix-a-possible-bug.patch	text/x-patch	1.1 KB
0011-Some-non-functional-fixes.patch	text/x-patch	15.8 KB
0010-Fix-a-typo-of-mcxt.c.patch	text/x-patch	936 bytes
0009-Fix-the-resource-owner-to-be-used.patch	text/x-patch	2.1 KB
0008-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	4.4 KB
0007-Add-instrumentation-to-async-execution.patch	text/x-patch	2.9 KB
0006-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB
0005-Use-resource-owner-to-prevent-wait-event-set-from-le.patch	text/x-patch	8.9 KB
0004-Make-postgres_fdw-async-capable.patch	text/x-patch	43.2 KB
0003-Modify-async-execution-infrastructure.patch	text/x-patch	29.7 KB
0002-Fix-some-bugs.patch	text/x-patch	16.3 KB
0001-robert-s-2nd-framework.patch	text/x-patch	42.9 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc:	robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-02-23 11:59:25
Message-ID:	20170223.205925.120810176.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello, I totally reorganized the patch set to four pathces on the
current master (9e43e87).

At Wed, 22 Feb 2017 17:39:45 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170222(dot)173945(dot)262776579(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Finally, I couldn't see the crash for the (maybe) same case. I
> can guess two reasons for this. One is that a situation where
> node->as_nasyncpending differs from estate->es_num_pending_async,
> but I couldn't find a possibility. Another is a situation in
> postgresIterateForeignScan where the "next owner" reaches eof but
> another waiter is not. I haven't reproduce the situation but
> fixed it for the case. Addition to that I found a bug in
> ExecAsyncAppendResponse. It calls bms_add_member inappropriate
> way.

This found to be wrong. The true problem here was (maybe) that
ExecAsyncRequest can complete a tuple immediately. This causes
multiple calling to ExecAsyncRequest for the same child at
once. (For the case, the processing node is added again to
node->as_needrequest before ExecAsyncRequest returns.)

Using a copy of node->as_needrequest will fix this but it is
uneasy so I changed ExecAsyncRequest not to return a tuple
immediately. Instaed, ExecAsyncEventLoop skips waiting if no node
to wait. The tuple previously "response"'ed in ExecAsyncRequest
is now responsed here.

Addition to that, the current policy of preserving of
es_wait_event_set doesn't seem to work with the async-capable
postgres_fdw. So the current code cleares it at every entering to
ExecAppend. This needs more thoughts.

I measured the performance of async-execution and it was quite
good from the previous version especially for single-connection
environment.

pf0: 4 foreign tables on single connection
non async : (prev) 7928ms -> (this time)7993ms
async : (prev) 6447ms -> (this time)3211ms

pf1: 4 foreign tables on dedicate connection for every table
non async : (prev) 8023ms -> (this time)7953ms
async : (prev) 1876ms -> (this time)1841ms

Boost rate by async execution is 60% for single connectsion and
77% for dedicate connection environment.

> > Mmm, I reproduces it quite easily. A silly bug.
> >
> > Something bad is happening between freeing ExecutorState memory
> > context and resource owner. Perhaps the ExecutorState is freed by
> > resowner (as a part of its anscestors) before the memory for the
> > WaitEventSet is freed. It was careless of me. I'll reconsider it.
>
> The cause was that the WaitEventSet was placed in ExecutorState
> but registered to TopTransactionResourceOwner. I fixed it.

The attached patches are the following.

0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch
Allows WaitEventSet to released by resource owner.

0002-Asynchronous-execution-framework.patch
Asynchronous execution framework based on Robert's version. All
edits on this is merged.

0003-Make-postgres_fdw-async-capable.patch
Make postgres_fdw async-capable.

0004-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch

This can be merged to 0002 but I didn't since the usage of
using these pragmas is arguable.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-Asynchronous-execution-framework.patch	text/x-patch	47.1 KB
0003-Make-postgres_fdw-async-capable.patch	text/x-patch	50.9 KB
0004-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-10 23:19:25
Message-ID:	CADkLM=eHssZhc2tC4t6C=8uGn2LV=NE6LuzavZc3+=mFW0AC0A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 23, 2017 at 6:59 AM, Kyotaro HORIGUCHI <
horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> 9e43e87

Patch fails on current master, but correctly applies to 9e43e87. Thanks for
including the commit id.

Regression tests pass.

As with my last attempt at reviewing this patch, I'm confused about what
kind of queries can take advantage of this patch. Is it only cases where a
local table has multiple inherited foreign table children? Will it work
with queries where two foreign tables are referenced and combined with a
UNION ALL?

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-13 02:33:00
Message-ID:	7847e2c8-2d8b-c692-4e0b-2544887ba606@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017/03/11 8:19, Corey Huinker wrote:
>
> On Thu, Feb 23, 2017 at 6:59 AM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp <mailto:horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>>
> wrote:
>
> 9e43e87
>
>
> Patch fails on current master, but correctly applies to 9e43e87. Thanks
> for including the commit id.
>
> Regression tests pass.
>
> As with my last attempt at reviewing this patch, I'm confused about what
> kind of queries can take advantage of this patch. Is it only cases where a
> local table has multiple inherited foreign table children?

IIUC, Horiguchi-san's patch adds asynchronous capability for ForeignScan's
driven by postgres_fdw (after building some relevant infrastructure
first). The same might be added to other Scan nodes (and probably other
nodes as well) eventually so that more queries will benefit from
asynchronous execution. It may just be that ForeignScan's benefit more
from asynchronous execution than other Scan types.

> Will it work
> with queries where two foreign tables are referenced and combined with a
> UNION ALL?

I think it will, because Append itself has been made async-capable by one
of the patches and UNION ALL uses Append. But as mentioned above, only
the postgres_fdw foreign tables will be able to utilize this for now.

Thanks,
Amit

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-13 05:06:23
Message-ID:	CADkLM=dCp=JRAjiC5bcH2c5Ug3EOMrdDs+CiyZjogOHkHCfH8Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
>
> I think it will, because Append itself has been made async-capable by one
> of the patches and UNION ALL uses Append. But as mentioned above, only
> the postgres_fdw foreign tables will be able to utilize this for now.
>
>
Ok, I'll re-run my test from a few weeks back and see if anything has
changed.

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-13 21:31:17
Message-ID:	CADkLM=ffokmXLdc3QRAs74zLC84AE1YWWnWGOt9PQ8osTzzs1w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 13, 2017 at 1:06 AM, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
wrote:

>
>> I think it will, because Append itself has been made async-capable by one
>> of the patches and UNION ALL uses Append. But as mentioned above, only
>> the postgres_fdw foreign tables will be able to utilize this for now.
>>
>>
> Ok, I'll re-run my test from a few weeks back and see if anything has
> changed.
>

I'm not able to discern any difference in plan between a 9.6 instance and
this patch.

The basic outline of my test is:

EXPLAIN ANALYZE
SELECT c1, c2, ..., cN FROM tab1 WHERE date = '1 day ago'
UNION ALL
SELECT c1, c2, ..., cN FROM tab2 WHERE date = '2 days ago'
UNION ALL
SELECT c1, c2, ..., cN FROM tab3 WHERE date = '3 days ago'
UNION ALL
SELECT c1, c2, ..., cN FROM tab4 WHERE date = '4 days ago'

I've tried this test where tab1 through tab4 all are the same postgres_fdw
foreign table.
I've tried this test where tab1 through tab4 all are different foreign
tables pointing to the same remote table sharing a the same server
definition.
I've tried this test where tab1 through tab4 all are different foreign
tables pointing each with it's own foreign server definition, all of which
happen to point to the same remote cluster.

Are there some postgresql.conf settings I should set to get a decent test?

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-14 00:25:53
Message-ID:	dd705978-add0-7503-e78c-2ae78f6e869d@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017/03/14 6:31, Corey Huinker wrote:
> On Mon, Mar 13, 2017 at 1:06 AM, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
> wrote:
>
>>
>>> I think it will, because Append itself has been made async-capable by one
>>> of the patches and UNION ALL uses Append. But as mentioned above, only
>>> the postgres_fdw foreign tables will be able to utilize this for now.
>>>
>>>
>> Ok, I'll re-run my test from a few weeks back and see if anything has
>> changed.
>>
>
>
> I'm not able to discern any difference in plan between a 9.6 instance and
> this patch.
>
> The basic outline of my test is:
>
> EXPLAIN ANALYZE
> SELECT c1, c2, ..., cN FROM tab1 WHERE date = '1 day ago'
> UNION ALL
> SELECT c1, c2, ..., cN FROM tab2 WHERE date = '2 days ago'
> UNION ALL
> SELECT c1, c2, ..., cN FROM tab3 WHERE date = '3 days ago'
> UNION ALL
> SELECT c1, c2, ..., cN FROM tab4 WHERE date = '4 days ago'
>
>
> I've tried this test where tab1 through tab4 all are the same postgres_fdw
> foreign table.
> I've tried this test where tab1 through tab4 all are different foreign
> tables pointing to the same remote table sharing a the same server
> definition.
> I've tried this test where tab1 through tab4 all are different foreign
> tables pointing each with it's own foreign server definition, all of which
> happen to point to the same remote cluster.
>
> Are there some postgresql.conf settings I should set to get a decent test?

I don't think the plan itself will change as a result of applying this
patch. You might however be able to observe some performance improvement.

Thanks,
Amit

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-14 01:08:03
Message-ID:	CADkLM=fDOTnmJO4YQhvqMGoRGVzkCzZ0r0r4SUzM6aJn=d7x_g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
> I don't think the plan itself will change as a result of applying this
> patch. You might however be able to observe some performance improvement.
>
> Thanks,
> Amit
>

I could see no performance improvement, even with 16 separate queries
combined with UNION ALL. Query performance was always with +/- 10% of a 9.6
instance given the same script. I must be missing something.

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-14 01:28:36
Message-ID:	e7dc8128-f32b-ff9a-870e-f1117b8e4fa6@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017/03/14 10:08, Corey Huinker wrote:
>> I don't think the plan itself will change as a result of applying this
>> patch. You might however be able to observe some performance improvement.
>
> I could see no performance improvement, even with 16 separate queries
> combined with UNION ALL. Query performance was always with +/- 10% of a 9.6
> instance given the same script. I must be missing something.

Hmm, maybe I'm missing something too.

Anyway, here is an older message on this thread from Horiguchi-san where
he shared some of the test cases that this patch improves performance for:

https://www.postgresql.org/message-id/20161018.103051.30820907.horiguchi.kyotaro%40lab.ntt.co.jp

From that message:

<quote>
I measured performance and had the following result.

The result is written as "time<ms> (std dev <ms>)"

sync
t0: 3820.33 ( 1.88)
pl: 1608.59 ( 12.06)
pf0: 7928.29 ( 46.58)
pf1: 8023.16 ( 26.43)

async
t0: 3806.31 ( 4.49) 0.4% faster (should be error)
pl: 1629.17 ( 0.29) 1.3% slower
pf0: 6447.07 ( 25.19) 18.7% faster
pf1: 1876.80 ( 47.13) 76.6% faster
</quote>

IIUC, pf0 and pf1 is the same test case (all 4 foreign tables target the
same server) measured with different implementations of the patch.

Thanks,
Amit

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-16 17:03:21
Message-ID:	CADkLM=cqfQL5+CDa6eVqNvM2oNXCR8gcgTML2r7mpetGx_e6GA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Mar 13, 2017 at 9:28 PM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
> wrote:

> On 2017/03/14 10:08, Corey Huinker wrote:
> >> I don't think the plan itself will change as a result of applying this
> >> patch. You might however be able to observe some performance
> improvement.
> >
> > I could see no performance improvement, even with 16 separate queries
> > combined with UNION ALL. Query performance was always with +/- 10% of a
> 9.6
> > instance given the same script. I must be missing something.
>
> Hmm, maybe I'm missing something too.
>
> Anyway, here is an older message on this thread from Horiguchi-san where
> he shared some of the test cases that this patch improves performance for:
>
> https://www.postgresql.org/message-id/20161018.103051.
> 30820907.horiguchi.kyotaro%40lab.ntt.co.jp
>
> From that message:
>
> <quote>
> I measured performance and had the following result.
>
> t0 - SELECT sum(a) FROM <local single table>;
> pl - SELECT sum(a) FROM <4 local children>;
> pf0 - SELECT sum(a) FROM <4 foreign children on single connection>;
> pf1 - SELECT sum(a) FROM <4 foreign children on dedicate connections>;
>
> The result is written as "time<ms> (std dev <ms>)"
>
> sync
> t0: 3820.33 ( 1.88)
> pl: 1608.59 ( 12.06)
> pf0: 7928.29 ( 46.58)
> pf1: 8023.16 ( 26.43)
>
> async
> t0: 3806.31 ( 4.49) 0.4% faster (should be error)
> pl: 1629.17 ( 0.29) 1.3% slower
> pf0: 6447.07 ( 25.19) 18.7% faster
> pf1: 1876.80 ( 47.13) 76.6% faster
> </quote>
>
> IIUC, pf0 and pf1 is the same test case (all 4 foreign tables target the
> same server) measured with different implementations of the patch.
>
> Thanks,
> Amit
>

I reworked the test such that all of the foreign tables inherit from the
same parent table, and if you query that you do get async execution. But It
doesn't work when just stringing together those foreign tables with UNION
ALLs.

I don't know how to proceed with this review if that was a goal of the
patch.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-16 20:17:03
Message-ID:	22644.1489695423@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Corey Huinker <corey(dot)huinker(at)gmail(dot)com> writes:
> I reworked the test such that all of the foreign tables inherit from the
> same parent table, and if you query that you do get async execution. But It
> doesn't work when just stringing together those foreign tables with UNION
> ALLs.

> I don't know how to proceed with this review if that was a goal of the
> patch.

Whether it was a goal or not, I'd say there is something either broken
or incorrectly implemented if you don't see that. The planner (and
therefore also the executor) generally treats inheritance the same as
simple UNION ALL. If that's not the case here, I'd want to know why.

regards, tom lane

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-16 21:16:32
Message-ID:	CADkLM=cBZEX9L9HnhJYrtfiAN5Ebdu=xbvM_poWVGBR7yN3gVw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Mar 16, 2017 at 4:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Corey Huinker <corey(dot)huinker(at)gmail(dot)com> writes:
> > I reworked the test such that all of the foreign tables inherit from the
> > same parent table, and if you query that you do get async execution. But
> It
> > doesn't work when just stringing together those foreign tables with UNION
> > ALLs.
>
> > I don't know how to proceed with this review if that was a goal of the
> > patch.
>
> Whether it was a goal or not, I'd say there is something either broken
> or incorrectly implemented if you don't see that. The planner (and
> therefore also the executor) generally treats inheritance the same as
> simple UNION ALL. If that's not the case here, I'd want to know why.
>
> regards, tom lane
>

Updated commitfest entry to "Returned With Feedback".

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	corey(dot)huinker(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-17 08:35:05
Message-ID:	20170317.173505.152063931.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 16 Mar 2017 17:16:32 -0400, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote in <CADkLM=cBZEX9L9HnhJYrtfiAN5Ebdu=xbvM_poWVGBR7yN3gVw(at)mail(dot)gmail(dot)com>
> On Thu, Mar 16, 2017 at 4:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > Corey Huinker <corey(dot)huinker(at)gmail(dot)com> writes:
> > > I reworked the test such that all of the foreign tables inherit from the
> > > same parent table, and if you query that you do get async execution. But
> > It
> > > doesn't work when just stringing together those foreign tables with UNION
> > > ALLs.
> >
> > > I don't know how to proceed with this review if that was a goal of the
> > > patch.
> >
> > Whether it was a goal or not, I'd say there is something either broken
> > or incorrectly implemented if you don't see that. The planner (and
> > therefore also the executor) generally treats inheritance the same as
> > simple UNION ALL. If that's not the case here, I'd want to know why.
> >
> > regards, tom lane
> >
>
> Updated commitfest entry to "Returned With Feedback".

Sorry for the absense. For information, I'll continue to write
some more.

At Tue, 14 Mar 2017 10:28:36 +0900, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote in <e7dc8128-f32b-ff9a-870e-f1117b8e4fa6(at)lab(dot)ntt(dot)co(dot)jp>
> async
> t0: 3806.31 ( 4.49) 0.4% faster (should be error)
> pl: 1629.17 ( 0.29) 1.3% slower
> pf0: 6447.07 ( 25.19) 18.7% faster
> pf1: 1876.80 ( 47.13) 76.6% faster
> </quote>
>
> IIUC, pf0 and pf1 is the same test case (all 4 foreign tables target the
> same server) measured with different implementations of the patch.

pf0 is measured on a partitioned(sharded) tables on one foreign
server, that is, sharing a connection. pf1 is in contrast sharded
tables that have dedicate server (or connection). The parent
server is async-patched and the child server is not patched.

Async-capable plan is generated in planner. An Append contains at
least one async-capable child becomes async-aware Append. So the
async feature should be effective also for the UNION ALL case.

The following will work faster than unpatched version.I

SELECT sum(a) FROM (SELECT a FROM ft10 UNION ALL SELECT a FROM ft20 UNION ALL SELECT a FROM ft30 UNION ALL SELECT a FROM ft40) as ft;

I'll measure the performance for the case next week.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0002-Asynchronous-execution-framework.patch	text/x-patch	46.7 KB
0004-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB
0003-Make-postgres_fdw-async-capable.patch	text/x-patch	50.9 KB
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	corey(dot)huinker(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-03-21 03:16:12
Message-ID:	20170321.121612.253115026.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello. This is the final report in this CF period.

At Fri, 17 Mar 2017 17:35:05 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170317(dot)173505(dot)152063931(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Async-capable plan is generated in planner. An Append contains at
> least one async-capable child becomes async-aware Append. So the
> async feature should be effective also for the UNION ALL case.
>
> The following will work faster than unpatched version.I
>
> SELECT sum(a) FROM (SELECT a FROM ft10 UNION ALL SELECT a FROM ft20 UNION ALL SELECT a FROM ft30 UNION ALL SELECT a FROM ft40) as ft;
>
> I'll measure the performance for the case next week.

I found that the following query works as the same as partitioned
table.

SELECT sum(a) FROM (SELECT a FROM ft10 UNION ALL SELECT a FROM ft20 UNION ALL SELECT a FROM ft30 UNION ALL SELECT a FROM ft40 UNION ALL *SELECT a FROM ONLY pf0*) as ft;

So, the difference comes from the additional async-uncapable
child (faster if contains any). In both cases, Append node runs
children asynchronously but slightly differently when all
async-capable children are busy.

I'll continue working on this from this point aiming to the next
commit fest.

Thank you for valuable feedback.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-04-02 16:21:14
Message-ID:	CADkLM=dN_vt8kazOoiVOfjN6xFHpzf5uiGJz+iN+f4fLbYwSKA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>
>
> I'll continue working on this from this point aiming to the next
> commit fest.
>
>
This probably will not surprise you given the many commits in the past 2
weeks, but the patches no longer apply to master:

$ git apply
~/async/0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch
/home/ubuntu/async/0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch:27:
trailing whitespace.
FeBeWaitSet = CreateWaitEventSet(TopMemoryContext, NULL, 3);
/home/ubuntu/async/0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch:39:
trailing whitespace.
#include "utils/resowner_private.h"
/home/ubuntu/async/0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch:47:
trailing whitespace.
ResourceOwner resowner; /* Resource owner */
/home/ubuntu/async/0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch:48:
trailing whitespace.

/home/ubuntu/async/0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch:57:
trailing whitespace.
WaitEventSet *set = CreateWaitEventSet(CurrentMemoryContext, NULL,
3);
error: patch failed: src/backend/libpq/pqcomm.c:201
error: src/backend/libpq/pqcomm.c: patch does not apply
error: patch failed: src/backend/storage/ipc/latch.c:61
error: src/backend/storage/ipc/latch.c: patch does not apply
error: patch failed: src/backend/storage/lmgr/condition_variable.c:66
error: src/backend/storage/lmgr/condition_variable.c: patch does not apply
error: patch failed: src/backend/utils/resowner/resowner.c:124
error: src/backend/utils/resowner/resowner.c: patch does not apply
error: patch failed: src/include/storage/latch.h:101
error: src/include/storage/latch.h: patch does not apply
error: patch failed: src/include/utils/resowner_private.h:18
error: src/include/utils/resowner_private.h: patch does not apply

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	corey(dot)huinker(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-04-04 10:25:39
Message-ID:	20170404.192539.29699823.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Sun, 2 Apr 2017 12:21:14 -0400, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote in <CADkLM=dN_vt8kazOoiVOfjN6xFHpzf5uiGJz+iN+f4fLbYwSKA(at)mail(dot)gmail(dot)com>
> >
> >
> > I'll continue working on this from this point aiming to the next
> > commit fest.
> >
> >
> This probably will not surprise you given the many commits in the past 2
> weeks, but the patches no longer apply to master:

Yeah, I won't surprise by that but thank you for noticing
me. Greately reduces the difficulty of merging. Thank you.

> $ git apply
> ~/async/0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch
> /home/ubuntu/async/0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch:27:
> trailing whitespace.

Maybe the patch was retrieved on Windows then transferred to
Linux box. Converting EOLs of the files or some git configuration
might save that. (git am has --no-keep-cr but I haven't find that
for git apply)

The attached patch is rebased on the current master, but no
substantial changes other than disallowing partitioned tables on
async by assertion.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-Asynchronous-execution-framework.patch	text/x-patch	47.0 KB
0003-Make-postgres_fdw-async-capable.patch	text/x-patch	50.9 KB
0004-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB
0005-Fix-a-typo-of-mcxt.c.patch	text/x-patch	934 bytes

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	corey(dot)huinker(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-05-22 04:12:14
Message-ID:	20170522.131214.20936668.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

At Tue, 04 Apr 2017 19:25:39 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170404(dot)192539(dot)29699823(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> The attached patch is rebased on the current master, but no
> substantial changes other than disallowing partitioned tables on
> async by assertion.

This is just rebased onto the current master (d761fe2).
I'll recheck further detail after this.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-Asynchronous-execution-framework.patch	text/x-patch	30.0 KB
0003-Make-postgres_fdw-async-capable.patch	text/x-patch	52.9 KB
0004-Apply-unlikely-to-suggest-synchronous-route-of.patch	text/x-patch	1.3 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	corey(dot)huinker(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-05-22 05:22:53
Message-ID:	20170522.142253.40733906.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 22 May 2017 13:12:14 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170522(dot)131214(dot)20936668(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > The attached patch is rebased on the current master, but no
> > substantial changes other than disallowing partitioned tables on
> > async by assertion.
>
> This is just rebased onto the current master (d761fe2).
> I'll recheck further detail after this.

Sorry, the patch was missing some files to add.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-Asynchronous-execution-framework.patch	text/x-patch	47.0 KB
0003-Make-postgres_fdw-async-capable.patch	text/x-patch	52.9 KB
0004-Apply-unlikely-to-suggest-synchronous-route-of.patch	text/x-patch	1.3 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	corey(dot)huinker(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp, robertmhaas(at)gmail(dot)com, amitdkhan(dot)pg(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-06-22 05:17:20
Message-ID:	20170622.141720.190958545.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

The patch got conflicted. This is a new version just rebased to
the current master. Furtuer amendment will be taken later.

> The attached patch is rebased on the current master, but no
> substantial changes other than disallowing partitioned tables on
> async by assertion.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-Asynchronous-execution-framework.patch	text/x-patch	47.0 KB
0003-Make-postgres_fdw-async-capable.patch	text/x-patch	52.6 KB
0004-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-06-28 08:23:54
Message-ID:	4579.1498638234@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> The patch got conflicted. This is a new version just rebased to
> the current master. Furtuer amendment will be taken later.

Can you please explain this part of make_append() ?

/* Currently async on partitioned tables is not available */
Assert(nasyncplans == 0 || partitioned_rels == NIL);

I don't think the output of Append plan is supposed to be ordered even if the
underlying relation is partitioned. Besides ordering, is there any other
reason not to use the asynchronous execution?

And even if there was some, the planner should ensure that executor does not
fire the assertion statement above. The script attached shows an example how
to cause the assertion failure.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Attachment	Content-Type	Size
async_append_test.sh	text/plain	1.7 KB

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-06-28 18:22:24
Message-ID:	392.1498674144@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> The patch got conflicted. This is a new version just rebased to
> the current master. Furtuer amendment will be taken later.

Just one idea that I had while reading the code.

In ExecAsyncEventLoop you iterate estate->es_pending_async, then move the
complete requests to the end and finaly adjust estate->es_num_pending_async so
that the array no longer contains the complete requests. I think the point is
that then you can add new requests to the end of the array.

I wonder if a set (Bitmapset) of incomplete requests would make the code more
efficient. The set would contain position of each incomplete request in
estate->es_num_pending_async (I think it's the myindex field of
PendingAsyncRequest). If ExecAsyncEventLoop used this set to retrieve the
requests subject to ExecAsyncNotify etc, then the compaction of
estate->es_pending_async wouldn't be necessary.

ExecAsyncRequest would use the set to look for space for new requests by
iterating it and trying to find the first gap (which corresponds to completed
request).

And finally, item would be removed from the set at the moment the request
state is being set to ASYNCREQ_COMPLETE.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ah(at)cybertec(dot)at
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-06-29 04:45:51
Message-ID:	20170629.134551.108851183.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for looking this.

At Wed, 28 Jun 2017 10:23:54 +0200, Antonin Houska <ah(at)cybertec(dot)at> wrote in <4579(dot)1498638234(at)localhost>
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> > The patch got conflicted. This is a new version just rebased to
> > the current master. Furtuer amendment will be taken later.
>
> Can you please explain this part of make_append() ?
>
> /* Currently async on partitioned tables is not available */
> Assert(nasyncplans == 0 || partitioned_rels == NIL);
>
> I don't think the output of Append plan is supposed to be ordered even if the
> underlying relation is partitioned. Besides ordering, is there any other
> reason not to use the asynchronous execution?

It was just a developmental sentinel that will remind me later to
consider the declarative partitions since I didn't have an idea
of the differences (or the similarity) between appendrels and
partitioned_rels. It is never to say the condition cannot
make. I'll check it out and will support partitioned_rels sooner.
Sorry for having left it as it is.

> And even if there was some, the planner should ensure that executor does not
> fire the assertion statement above. The script attached shows an example how
> to cause the assertion failure.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, ah(at)cybertec(dot)at
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-06-29 05:08:27
Message-ID:	63a5a01c-2967-83e0-8bbf-c981404f529e@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017/06/29 13:45, Kyotaro HORIGUCHI wrote:
> Thank you for looking this.
>
> At Wed, 28 Jun 2017 10:23:54 +0200, Antonin Houska wrote:
>> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>>
>>> The patch got conflicted. This is a new version just rebased to
>>> the current master. Furtuer amendment will be taken later.
>>
>> Can you please explain this part of make_append() ?
>>
>> /* Currently async on partitioned tables is not available */
>> Assert(nasyncplans == 0 || partitioned_rels == NIL);
>>
>> I don't think the output of Append plan is supposed to be ordered even if the
>> underlying relation is partitioned. Besides ordering, is there any other
>> reason not to use the asynchronous execution?
>
> It was just a developmental sentinel that will remind me later to
> consider the declarative partitions since I didn't have an idea
> of the differences (or the similarity) between appendrels and
> partitioned_rels. It is never to say the condition cannot
> make. I'll check it out and will support partitioned_rels sooner.
> Sorry for having left it as it is.

When making an Append for a partitioned table, among the arguments passed
to make_append(), 'partitioned_rels' is a list of RT indexes of
partitioned tables in the inheritance tree of which the aforementioned
partitioned table is the root. 'appendplans' is a list of subplans for
scanning the leaf partitions in the tree. Note that the 'appendplans'
list contains no members corresponding to the partitioned tables, because
we don't need to scan them (only leaf relations contain any data).

The point of having the 'partitioned_rels' list in the resulting Append
plan is so that the executor can identify those relations and take the
appropriate locks on them.

Thanks,
Amit

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp
Cc:	ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-07-03 08:57:22
Message-ID:	20170703.175722.238947946.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi, I've returned.

At Thu, 29 Jun 2017 14:08:27 +0900, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote in <63a5a01c-2967-83e0-8bbf-c981404f529e(at)lab(dot)ntt(dot)co(dot)jp>
> Hi,
>
> On 2017/06/29 13:45, Kyotaro HORIGUCHI wrote:
> > Thank you for looking this.
> >
> > At Wed, 28 Jun 2017 10:23:54 +0200, Antonin Houska wrote:
> >> Can you please explain this part of make_append() ?
> >>
> >> /* Currently async on partitioned tables is not available */
> >> Assert(nasyncplans == 0 || partitioned_rels == NIL);
> >>
> >> I don't think the output of Append plan is supposed to be ordered even if the
> >> underlying relation is partitioned. Besides ordering, is there any other
> >> reason not to use the asynchronous execution?
>
> When making an Append for a partitioned table, among the arguments passed
> to make_append(), 'partitioned_rels' is a list of RT indexes of
> partitioned tables in the inheritance tree of which the aforementioned
> partitioned table is the root. 'appendplans' is a list of subplans for
> scanning the leaf partitions in the tree. Note that the 'appendplans'
> list contains no members corresponding to the partitioned tables, because
> we don't need to scan them (only leaf relations contain any data).
>
> The point of having the 'partitioned_rels' list in the resulting Append
> plan is so that the executor can identify those relations and take the
> appropriate locks on them.

Amit, thank you for the detailed explanation. I understand what
it is and that just ignoring it is enough, then confirmed that
actually works as before.

I'll then adresss Antonin's comments tomorrow.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ah(at)cybertec(dot)at
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-07-04 04:08:46
Message-ID:	20170704.130846.251038264.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the thought.

This is at PoC level so I'd be grateful for this kind of
fundamental comments.

At Wed, 28 Jun 2017 20:22:24 +0200, Antonin Houska <ah(at)cybertec(dot)at> wrote in <392(dot)1498674144(at)localhost>
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> > The patch got conflicted. This is a new version just rebased to
> > the current master. Furtuer amendment will be taken later.
>
> Just one idea that I had while reading the code.
>
> In ExecAsyncEventLoop you iterate estate->es_pending_async, then move the
> complete requests to the end and finaly adjust estate->es_num_pending_async so
> that the array no longer contains the complete requests. I think the point is
> that then you can add new requests to the end of the array.
>
> I wonder if a set (Bitmapset) of incomplete requests would make the code more
> efficient. The set would contain position of each incomplete request in
> estate->es_num_pending_async (I think it's the myindex field of
> PendingAsyncRequest). If ExecAsyncEventLoop used this set to retrieve the
> requests subject to ExecAsyncNotify etc, then the compaction of
> estate->es_pending_async wouldn't be necessary.
>
> ExecAsyncRequest would use the set to look for space for new requests by
> iterating it and trying to find the first gap (which corresponds to completed
> request).
>
> And finally, item would be removed from the set at the moment the request
> state is being set to ASYNCREQ_COMPLETE.

Effectively it is a waiting-queue followed by a
completed-list. The point of the compaction is keeping the order
of waiting or not-yet-completed requests, which is crucial to
avoid kind-a precedence inversion. We cannot keep the order by
using bitmapset in such way.

The current code waits all waiters at once and processes all
fired events at once. The order in the waiting-queue is
inessential in the case. On the other hand I suppoese waiting on
several-tens to near-hundred remote hosts is in a realistic
target range. Keeping the order could be crucial if we process a
part of the queue at once in the case.

Putting siginificance on the deviation of response time of
remotes, process-all-at-once is effective. In turn we should
consider the effectiveness of the lifecycle of the larger wait
event set.

Sorry for the discursive discussion but in short, I have noticed
that I have a lot to consider on this:p Thanks!

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Antonin Houska <ah(at)cybertec(dot)at>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-07-11 08:28:51
Message-ID:	6448.1499761731@localhost
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> > Just one idea that I had while reading the code.
> >
> > In ExecAsyncEventLoop you iterate estate->es_pending_async, then move the
> > complete requests to the end and finaly adjust estate->es_num_pending_async so
> > that the array no longer contains the complete requests. I think the point is
> > that then you can add new requests to the end of the array.
> >
> > I wonder if a set (Bitmapset) of incomplete requests would make the code more
> > efficient. The set would contain position of each incomplete request in
> > estate->es_num_pending_async (I think it's the myindex field of
> > PendingAsyncRequest). If ExecAsyncEventLoop used this set to retrieve the
> > requests subject to ExecAsyncNotify etc, then the compaction of
> > estate->es_pending_async wouldn't be necessary.
> >
> > ExecAsyncRequest would use the set to look for space for new requests by
> > iterating it and trying to find the first gap (which corresponds to completed
> > request).
> >
> > And finally, item would be removed from the set at the moment the request
> > state is being set to ASYNCREQ_COMPLETE.
>
> Effectively it is a waiting-queue followed by a
> completed-list. The point of the compaction is keeping the order
> of waiting or not-yet-completed requests, which is crucial to
> avoid kind-a precedence inversion. We cannot keep the order by
> using bitmapset in such way.

> The current code waits all waiters at once and processes all
> fired events at once. The order in the waiting-queue is
> inessential in the case. On the other hand I suppoese waiting on
> several-tens to near-hundred remote hosts is in a realistic
> target range. Keeping the order could be crucial if we process a
> part of the queue at once in the case.
>
> Putting siginificance on the deviation of response time of
> remotes, process-all-at-once is effective. In turn we should
> consider the effectiveness of the lifecycle of the larger wait
> event set.

ok, I missed the fact that the order of es_pending_async entries is
important. I think this is worth adding a comment.

Actually the reason I thought of simplification was that I noticed small
inefficiency in the way you do the compaction. In particular, I think it's not
always necessary to swap the tail and head entries. Would something like this
make sense?

/* If any node completed, compact the array. */
if (any_node_done)
{
int hidx = 0,
tidx;

/*
* Swap all non-yet-completed items to the start of the array.
* Keep them in the same order.
*/
for (tidx = 0; tidx < estate->es_num_pending_async; ++tidx)
{
PendingAsyncRequest *tail = estate->es_pending_async[tidx];

Assert(tail->state != ASYNCREQ_CALLBACK_PENDING);

if (tail->state == ASYNCREQ_COMPLETE)
continue;

/*
* If the array starts with one or more incomplete requests,
* both head and tail point at the same item, so there's no
* point in swapping.
*/
if (tidx > hidx)
{
PendingAsyncRequest *head = estate->es_pending_async[hidx];

/*
* Once the tail got ahead, it should only leave
* ASYNCREQ_COMPLETE behind. Only those can then be seen
* by head.
*/
Assert(head->state == ASYNCREQ_COMPLETE);

estate->es_pending_async[tidx] = head;
estate->es_pending_async[hidx] = tail;
}

++hidx;
}

estate->es_num_pending_async = hidx;
}

And besides that, I think it'd be more intuitive if the meaning of "head" and
"tail" was reversed: if the array is iterated from lower to higher positions,
then I'd consider head to be at higher position, not tail.

--
Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener
Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ah(at)cybertec(dot)at
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-07-18 07:24:52
Message-ID:	20170718.162452.221576658.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Tue, 11 Jul 2017 10:28:51 +0200, Antonin Houska <ah(at)cybertec(dot)at> wrote in <6448(dot)1499761731(at)localhost>
> Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Effectively it is a waiting-queue followed by a
> > completed-list. The point of the compaction is keeping the order
> > of waiting or not-yet-completed requests, which is crucial to
> > avoid kind-a precedence inversion. We cannot keep the order by
> > using bitmapset in such way.
>
> > The current code waits all waiters at once and processes all
> > fired events at once. The order in the waiting-queue is
> > inessential in the case. On the other hand I suppoese waiting on
> > several-tens to near-hundred remote hosts is in a realistic
> > target range. Keeping the order could be crucial if we process a
> > part of the queue at once in the case.
> >
> > Putting siginificance on the deviation of response time of
> > remotes, process-all-at-once is effective. In turn we should
> > consider the effectiveness of the lifecycle of the larger wait
> > event set.
>
> ok, I missed the fact that the order of es_pending_async entries is
> important. I think this is worth adding a comment.

I'll put an upper limit to the number of waiters processed at
once. Then add a comment like that.

> Actually the reason I thought of simplification was that I noticed small
> inefficiency in the way you do the compaction. In particular, I think it's not
> always necessary to swap the tail and head entries. Would something like this
> make sense?

I'm not sure, but I suppose that it is rare that all of the first
many elements in the array are not COMPLETE. In most cases the
first element gets a response first.

>
> /* If any node completed, compact the array. */
> if (any_node_done)
> {
...
> for (tidx = 0; tidx < estate->es_num_pending_async; ++tidx)
> {
...
> if (tail->state == ASYNCREQ_COMPLETE)
> continue;
>
> /*
> * If the array starts with one or more incomplete requests,
> * both head and tail point at the same item, so there's no
> * point in swapping.
> */
> if (tidx > hidx)
> {

This works to skip the first several elements when all of them
are ASYNCREQ_COMPLETE. I think it makes sense as long as it
doesn't harm the loop. The optimization is more effective by
putting out of the loop like this.

| for (tidx = 0; tidx < estate->es_num_pending_async &&
estate->es_pending_async[tidx] == ASYNCREQ_COMPLETE; ++tidx);
| for (; tidx < estate->es_num_pending_async; ++tidx)
...

> And besides that, I think it'd be more intuitive if the meaning of "head" and
> "tail" was reversed: if the array is iterated from lower to higher positions,
> then I'd consider head to be at higher position, not tail.

Yeah, but maybe the "head" is still confusing even if reversed
because it is still not a head of something. It might be less
confusing by rewriting it in more verbose-but-straightforwad way.

| int npending = 0;
|
| /* Skip over not-completed items at the beginning */
| while (npending < estate->es_num_pending_async &&
| estate->es_pending_async[npending] != ASYNCREQ_COMPLETE)
| npending++;
|
| /* Scan over the rest for not-completed items */
| for (i = npending + 1 ; i < estate->es_num_pending_async; ++i)
| {
| PendingAsyncRequest *tmp;
| PendingAsyncRequest *curr = estate->es_pending_async[i];
|
| if (curr->state == ASYNCREQ_COMPLETE)
| continue;
|
| /* Move the not-completed item to the tail of the first chunk */
| tmp = estate->es_pending_async[i];
| estate->es_pending_async[nepending] = tmp;
| estate->es_pending_async[i] = tmp;
| ++npending;
| }

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	ah(at)cybertec(dot)at
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-07-25 09:11:25
Message-ID:	20170725.181125.53006939.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

8bf58c0d9bd33686 badly conflicts with this patch, so I'll rebase
this and added a patch to refactor the function that Anotonin
pointed. This would be merged into 0002 patch.

At Tue, 18 Jul 2017 16:24:52 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170718(dot)162452(dot)221576658(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> I'll put an upper limit to the number of waiters processed at
> once. Then add a comment like that.
>
> > Actually the reason I thought of simplification was that I noticed small
> > inefficiency in the way you do the compaction. In particular, I think it's not
> > always necessary to swap the tail and head entries. Would something like this
> > make sense?
>
> I'm not sure, but I suppose that it is rare that all of the first
> many elements in the array are not COMPLETE. In most cases the
> first element gets a response first.
...
> Yeah, but maybe the "head" is still confusing even if reversed
> because it is still not a head of something. It might be less
> confusing by rewriting it in more verbose-but-straightforwad way.
>
>
> | int npending = 0;
> |
> | /* Skip over not-completed items at the beginning */
> | while (npending < estate->es_num_pending_async &&
> | estate->es_pending_async[npending] != ASYNCREQ_COMPLETE)
> | npending++;
> |
> | /* Scan over the rest for not-completed items */
> | for (i = npending + 1 ; i < estate->es_num_pending_async; ++i)
> | {
> | PendingAsyncRequest *tmp;
> | PendingAsyncRequest *curr = estate->es_pending_async[i];
> |
> | if (curr->state == ASYNCREQ_COMPLETE)
> | continue;
> |
> | /* Move the not-completed item to the tail of the first chunk */
> | tmp = estate->es_pending_async[i];
> | estate->es_pending_async[nepending] = tmp;
> | estate->es_pending_async[i] = tmp;
> | ++npending;
> | }

The last patch does something like this (with apparent bugs
fixed)

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-Asynchronous-execution-framework.patch	text/x-patch	46.9 KB
0003-Make-postgres_fdw-async-capable.patch	text/x-patch	51.7 KB
0004-Apply-unlikely-to-suggest-synchronous-route-of-ExecA.patch	text/x-patch	1.3 KB
0005-Refactor-ExecAsyncEventLoop.patch	text/x-patch	2.3 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Antonin Houska <ah(at)cybertec(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2017-07-26 21:16:43
Message-ID:	CA+TgmoYrbgTBnLwnr1v=pk+C=znWg7AgV9=M9ehrq6TDexPQNw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jul 25, 2017 at 5:11 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> [ new patches ]

I spent some time today refreshing my memory of what's going with this
thread today.

Ostensibly, the advantage of this framework over my previous proposal
is that it avoids inserting anything into ExecProcNode(), which is
probably a good thing to avoid given how frequently ExecProcNode() is
called. Unless the parent and the child both know about asynchronous
execution and choose to use it, everything runs exactly as it does
today and so there is no possibility of a complaint about a
performance hit. As far as it goes, that is good.

However, at a deeper level, I fear we haven't really solved the
problem. If an Append is directly on top of a ForeignScan node, then
this will work. But if an Append is indirectly on top of a
ForeignScan node with some other stuff in the middle, then it won't -
unless we make whichever nodes appear between the Append and the
ForeignScan async-capable. Indeed, we'd really want all kinds of
joins and aggregates to be async-capable so that examples like the one
Corey asked about in
http://postgr.es/m/CADkLM=fuvVdKvz92XpCRnb4=rj6bLOhSLifQ3RV=Sb4Q5rJsRA@mail.gmail.com
will work.

But if we do, then I fear we'll just be reintroducing the same
performance regression that we introduced by switching to this
framework from the previous one - or maybe a different one, but a
regression all the same. Every type of intermediate node will have to
have a code path where it uses ExecAsyncRequest() /
ExecAyncHogeResponse() rather than ExecProcNode() to get tuples, and
it seems like that will either end up duplicating a lot of code from
the regular code path or, alternatively, polluting the regular code
path with some of the async code's concerns to avoid duplication, and
maybe slowing things down.

Maybe that concern is unjustified; I'm not sure. Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Antonin Houska <ah(at)cybertec(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2017-07-26 21:43:49
Message-ID:	12815.1501105429@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Ostensibly, the advantage of this framework over my previous proposal
> is that it avoids inserting anything into ExecProcNode(), which is
> probably a good thing to avoid given how frequently ExecProcNode() is
> called. Unless the parent and the child both know about asynchronous
> execution and choose to use it, everything runs exactly as it does
> today and so there is no possibility of a complaint about a
> performance hit. As far as it goes, that is good.

> However, at a deeper level, I fear we haven't really solved the
> problem. If an Append is directly on top of a ForeignScan node, then
> this will work. But if an Append is indirectly on top of a
> ForeignScan node with some other stuff in the middle, then it won't -
> unless we make whichever nodes appear between the Append and the
> ForeignScan async-capable.

I have not been paying any attention to this thread whatsoever,
but I wonder if you can address your problem by building on top of
the ExecProcNode replacement that Andres is working on,
https://www.postgresql.org/message-id/20170726012641.bmhfcp5ajpudihl6@alap3.anarazel.de

The scheme he has allows $extra_stuff to be injected into ExecProcNode at
no cost when $extra_stuff is not needed, because you simply don't insert
the wrapper function when it's not needed. I'm not sure that it will
scale well to several different kinds of insertions though, for instance
if you wanted both instrumentation and async support on the same node.
But maybe those two couldn't be arms-length from each other anyway,
in which case it might be fine as-is.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, Antonin Houska <ah(at)cybertec(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2017-07-27 10:39:51
Message-ID:	CA+Tgmoa=ke_zfucOAa3YEUnBSC=FSXn8SU2aYc8PGBBp=Yy9fw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jul 26, 2017 at 5:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I have not been paying any attention to this thread whatsoever,
> but I wonder if you can address your problem by building on top of
> the ExecProcNode replacement that Andres is working on,
> https://www.postgresql.org/message-id/20170726012641.bmhfcp5ajpudihl6@alap3.anarazel.de
>
> The scheme he has allows $extra_stuff to be injected into ExecProcNode at
> no cost when $extra_stuff is not needed, because you simply don't insert
> the wrapper function when it's not needed. I'm not sure that it will
> scale well to several different kinds of insertions though, for instance
> if you wanted both instrumentation and async support on the same node.
> But maybe those two couldn't be arms-length from each other anyway,
> in which case it might be fine as-is.

Yeah, I don't quite see how that would apply in this case -- what we
need here is not as simple as just conditionally injecting an extra
bit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-07-28 08:31:29
Message-ID:	20170728.173129.214754483.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the comment.

At Wed, 26 Jul 2017 17:16:43 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoYrbgTBnLwnr1v=pk+C=znWg7AgV9=M9ehrq6TDexPQNw(at)mail(dot)gmail(dot)com>
> But if we do, then I fear we'll just be reintroducing the same
> performance regression that we introduced by switching to this
> framework from the previous one - or maybe a different one, but a
> regression all the same. Every type of intermediate node will have to
> have a code path where it uses ExecAsyncRequest() /
> ExecAyncHogeResponse() rather than ExecProcNode() to get tuples, and

I understand what Robert concerns and I think I share the same
opinion. It needs further different framework.

At Thu, 27 Jul 2017 06:39:51 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+Tgmoa=ke_zfucOAa3YEUnBSC=FSXn8SU2aYc8PGBBp=Yy9fw(at)mail(dot)gmail(dot)com>
> On Wed, Jul 26, 2017 at 5:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I have not been paying any attention to this thread whatsoever,
> > but I wonder if you can address your problem by building on top of
> > the ExecProcNode replacement that Andres is working on,
> > https://www.postgresql.org/message-id/20170726012641.bmhfcp5ajpudihl6@alap3.anarazel.de
> >
> > The scheme he has allows $extra_stuff to be injected into ExecProcNode at
> > no cost when $extra_stuff is not needed, because you simply don't insert
> > the wrapper function when it's not needed. I'm not sure that it will
> > scale well to several different kinds of insertions though, for instance
> > if you wanted both instrumentation and async support on the same node.
> > But maybe those two couldn't be arms-length from each other anyway,
> > in which case it might be fine as-is.
>
> Yeah, I don't quite see how that would apply in this case -- what we
> need here is not as simple as just conditionally injecting an extra
> bit.

Thank you for the pointer, Tom. The subject (segfault in HEAD...)
haven't made me think that this kind of discussion was held
there. Anyway it seems very closer to asynchronous execution so
I'll catch up it considering how I can associate with this.

Regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-07-31 09:42:45
Message-ID:	20170731.184245.06849084.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Fri, 28 Jul 2017 17:31:05 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170728(dot)173105(dot)238045591(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> Thank you for the comment.
>
> At Wed, 26 Jul 2017 17:16:43 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmoYrbgTBnLwnr1v=pk+C=znWg7AgV9=M9ehrq6TDexPQNw(at)mail(dot)gmail(dot)com>
> > regression all the same. Every type of intermediate node will have to
> > have a code path where it uses ExecAsyncRequest() /
> > ExecAyncHogeResponse() rather than ExecProcNode() to get tuples, and
>
> I understand what Robert concerns and I share the same
> opinion. It needs further different framework.
>
> At Thu, 27 Jul 2017 06:39:51 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+Tgmoa=ke_zfucOAa3YEUnBSC=FSXn8SU2aYc8PGBBp=Yy9fw(at)mail(dot)gmail(dot)com>
> > On Wed, Jul 26, 2017 at 5:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > The scheme he has allows $extra_stuff to be injected into ExecProcNode at
> > > no cost when $extra_stuff is not needed, because you simply don't insert
> > > the wrapper function when it's not needed. I'm not sure that it will
...
> > Yeah, I don't quite see how that would apply in this case -- what we
> > need here is not as simple as just conditionally injecting an extra
> > bit.
>
> Thank you for the pointer, Tom. The subject (segfault in HEAD...)
> haven't made me think that this kind of discussion was held
> there. Anyway it seems very closer to asynchronous execution so
> I'll catch up it considering how I can associate with this.

I understand the executor change which has just been made at
master based on the pointed thread. This seems to have the
capability to let exec-node switch to async-aware with no extra
cost on non-async processing. So it would be doable to (just)
*shrink* the current framework by detaching the async-aware side
of the API. But to get the most out of asynchrony, it is required
that multiple async-capable nodes distributed under async-unaware
nodes run simultaneously.

There seems two ways to achieve this.

One is propagating required-async-nodes bitmap up to the topmost
node and waiting for the all required nodes to become ready. In
the long run this requires all nodes to be async-aware and that
apparently would have bad effect on performance to async-unaware
nodes containing async-capable nodes.

Another is getting rid of recursive call to run an execution
tree. It is perhaps the same to what mentioned as "data-centric
processing" in a previous threads [1], [2], but I'd like to I pay
attention on the aspect of "enableing to resume execution tree
from arbitrary leaf node". So I'm considering to realize it
still in one-tuple-by-one manner instead of collecting all tuples
of a leaf node first. Even though I'm not sure it is doable.

[1] https://www.postgresql.org/message-id/BF2827DCCE55594C8D7A8F7FFD3AB77159A9B904@szxeml521-mbs.china.huawei.com
[2] https://www.postgresql.org/message-id/20160629183254.frcm3dgg54ud5m6o@alap3.anarazel.de

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Antonin Houska <ah(at)cybertec(dot)at>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: asynchronous execution
Date:	2017-08-01 20:27:41
Message-ID:	CA+TgmobbZrBPb7cvFj3ACPX2A_qSEB4ughRmB5dkGPXUYx_E+Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 31, 2017 at 5:42 AM, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> Another is getting rid of recursive call to run an execution
> tree.

That happens to be exactly what Andres did for expression evaluation
in commit b8d7f053c5c2bf2a7e8734fe3327f6a8bc711755, and I think
generalizing that to include the plan tree as well as expression trees
is likely to be the long-term way forward here. Unfortunately, that's
probably another gigantic patch (that should probably be written by
Andres).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-08-03 00:30:57
Message-ID:	20170803.093057.261590619.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thank you for the comment.

At Tue, 1 Aug 2017 16:27:41 -0400, Robert Haas <robertmhaas(at)gmail(dot)com> wrote in <CA+TgmobbZrBPb7cvFj3ACPX2A_qSEB4ughRmB5dkGPXUYx_E+Q(at)mail(dot)gmail(dot)com>
> On Mon, Jul 31, 2017 at 5:42 AM, Kyotaro HORIGUCHI
> <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> > Another is getting rid of recursive call to run an execution
> > tree.
>
> That happens to be exactly what Andres did for expression evaluation
> in commit b8d7f053c5c2bf2a7e8734fe3327f6a8bc711755, and I think
> generalizing that to include the plan tree as well as expression trees
> is likely to be the long-term way forward here.

I read it in the source tree. The patch implements converting
expression tree to an intermediate expression then run it on a
custom-made interpreter. Guessing from the word "upside down"
from Andres, the whole thing will become source-driven.

> Unfortunately, that's probably another gigantic patch (that
> should probably be written by Andres).

Yeah, but async executor on the current style of executor seems
furtile work, or sitting until the patch comes is also waste of
time. So I'm planning to include the following sutff in the next
PoC patch. Even I'm not sure it can land on the coming
Andres'patch.

- Tuple passing outside call-stack. (I remember it was in the
past of the thread around but not found)

This should be included in the Andres' patch.

- Give executor an ability to run from data-source (or driver)
nodes to the root.

I'm not sure this is included, but I suppose he is aiming this
kind of thing.

- Rebuid asynchronous execution on the upside-down executor.

regrds,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-08-31 12:52:36
Message-ID:	20170831.215236.135328985.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 03 Aug 2017 09:30:57 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170803(dot)093057(dot)261590619(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > Unfortunately, that's probably another gigantic patch (that
> > should probably be written by Andres).
>
> Yeah, but async executor on the current style of executor seems
> furtile work, or sitting until the patch comes is also waste of
> time. So I'm planning to include the following sutff in the next
> PoC patch. Even I'm not sure it can land on the coming
> Andres'patch.
>
> - Tuple passing outside call-stack. (I remember it was in the
> past of the thread around but not found)
>
> This should be included in the Andres' patch.
>
> - Give executor an ability to run from data-source (or driver)
> nodes to the root.
>
> I'm not sure this is included, but I suppose he is aiming this
> kind of thing.
>
> - Rebuid asynchronous execution on the upside-down executor.

Anyway, I modified ExecProcNode into push-up form and it *seems*
working to some extent. But trigger and cursors are almost broken
and several other regressions fail. Some nodes such like
windowagg are terriblly difficult to change to this push-up form
(using state machine). And of course it is terribly inefficient.

I'm afraid that all of this turns out to be in vain. But anyway,
and FWIW, I'll show the work to here after some cleansing work on
it.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-09-04 04:57:23
Message-ID:	20170904.135723.240320463.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 31 Aug 2017 21:52:36 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170831(dot)215236(dot)135328985(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> At Thu, 03 Aug 2017 09:30:57 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20170803(dot)093057(dot)261590619(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > Unfortunately, that's probably another gigantic patch (that
> > > should probably be written by Andres).
> >
> > Yeah, but async executor on the current style of executor seems
> > furtile work, or sitting until the patch comes is also waste of
> > time. So I'm planning to include the following sutff in the next
> > PoC patch. Even I'm not sure it can land on the coming
> > Andres'patch.
> >
> > - Tuple passing outside call-stack. (I remember it was in the
> > past of the thread around but not found)
> >
> > This should be included in the Andres' patch.
> >
> > - Give executor an ability to run from data-source (or driver)
> > nodes to the root.
> >
> > I'm not sure this is included, but I suppose he is aiming this
> > kind of thing.
> >
> > - Rebuid asynchronous execution on the upside-down executor.
>
> Anyway, I modified ExecProcNode into push-up form and it *seems*
> working to some extent. But trigger and cursors are almost broken
> and several other regressions fail. Some nodes such like
> windowagg are terriblly difficult to change to this push-up form
> (using state machine). And of course it is terribly inefficient.
>
> I'm afraid that all of this turns out to be in vain. But anyway,
> and FWIW, I'll show the work to here after some cleansing work on
> it.

So, this is that. Maybe this is really a bad way to go. Top of
the bads is it's terriblly hard to maintain because the behavior
of the state machine constructed in this patch is hardly
predictable so easily broken. During the 'cleansing work' I had
many crash or infinite-loop and they were a bit hard to
diagnose.. This will be soon broken by following commits.

Anyway and, again FWIW, this is that. I'll leave this for a while
(at least the period of this CF) and reconsider on async in
different forms.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
poc_pushexecutor_20170904_4faa1dc.tar.bz2	application/octet-stream	45.3 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: asynchronous execution
Date:	2017-10-20 08:37:07
Message-ID:	20171020.173707.12913619.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello.

Fully-asynchronous executor needs that every node is stateful and
suspendable at the time of requesting for the next tuples to
underneath nodes. I tried pure push-base executor but failed.

After the miserable patch upthread, I finally managed to make
executor nodes suspendable using computational jump and got rid
of recursive calls of executor. But it runs about x10 slower for
simple SeqScan case. (pgbench ran with 9% degradation.) It
doesn't seem recoverable by handy improvements. So I gave up
that.

Then I returned to single-level asynchrony, in other words, the
simple case with async-aware nodes just above async-capable
nodes. The motive of using the framework in the previous patch
was that we had degradation on the sync (or normal) code paths by
polluting ExecProcNode with async stuff and as Tom's suggestion
the node->ExecProcNode trick can isolate the async code path.

The attached PoC patch theoretically has no impact on the normal
code paths and just brings gain in async cases. (Additional
members in PlanState made degradation seemingly comes from
alignment, though.)

But I haven't had enough stable result from performance
test. Different builds from the same source code gives apparently
different results...

Anyway I'll show the best one in the several times run here.

original(ms) patched(ms) gain(%)
A: simple table scan : 9714.70 9656.73 0.6
B: local partitioning : 4119.44 4131.10 -0.3
C: single remote table : 9484.86 9141.89 3.7
D: sharding (single con) : 7114.34 6751.21 5.1
E: sharding (multi con) : 7166.56 1827.93 74.5

A and B are degradation checks, which are expected to show no
degradation. C is the gain only by postgres_fdw's command
presending on a remote table. D is the gain of sharding on a
connection. The number of partitions/shards is 4. E is the gain
using dedicate connection per shard.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-core-side-modification.patch	text/x-patch	25.0 KB
0003-async-postgres_fdw.patch	text/x-patch	47.9 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] asynchronous execution
Date:	2017-12-11 11:07:53
Message-ID:	20171211.200753.191768178.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello,

At Fri, 20 Oct 2017 17:37:07 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20171020(dot)173707(dot)12913619(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> The attached PoC patch theoretically has no impact on the normal
> code paths and just brings gain in async cases.

The parallel append just committed hit this and the attached are
the rebased version to the current HEAD. The result of a concise
performance test follows.

patched(ms) unpatched(ms) gain(%)
A: simple table scan : 3562.32 3444.81 -3.4
B: local partitioning : 1451.25 1604.38 9.5
C: single remote table : 8818.92 9297.76 5.1
D: sharding (single con) : 5966.14 6646.73 10.2
E: sharding (multi con) : 1802.25 6515.49 72.3

> A and B are degradation checks, which are expected to show no
> degradation. C is the gain only by postgres_fdw's command
> presending on a remote table. D is the gain of sharding on a
> connection. The number of partitions/shards is 4. E is the gain
> using dedicate connection per shard.

Test A is accelerated by parallel sequential scan. Introducing
parallel append accelerates test B. Comparing A and B, I doubt
that degradation is stably measurable at least my environment but
I believe that there's no degradation theoreticaly. The test C to
E still shows apparent gain.
regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-core-side-modification.patch	text/x-patch	29.1 KB
0003-async-postgres_fdw.patch	text/x-patch	48.0 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] asynchronous execution
Date:	2018-01-11 08:08:39
Message-ID:	20180111.170839.23674040.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Mon, 11 Dec 2017 20:07:53 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20171211(dot)200753(dot)191768178(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > The attached PoC patch theoretically has no impact on the normal
> > code paths and just brings gain in async cases.
>
> The parallel append just committed hit this and the attached are
> the rebased version to the current HEAD. The result of a concise
> performance test follows.
>
> patched(ms) unpatched(ms) gain(%)
> A: simple table scan : 3562.32 3444.81 -3.4
> B: local partitioning : 1451.25 1604.38 9.5
> C: single remote table : 8818.92 9297.76 5.1
> D: sharding (single con) : 5966.14 6646.73 10.2
> E: sharding (multi con) : 1802.25 6515.49 72.3
>
> > A and B are degradation checks, which are expected to show no
> > degradation. C is the gain only by postgres_fdw's command
> > presending on a remote table. D is the gain of sharding on a
> > connection. The number of partitions/shards is 4. E is the gain
> > using dedicate connection per shard.
>
> Test A is accelerated by parallel sequential scan. Introducing
> parallel append accelerates test B. Comparing A and B, I doubt
> that degradation is stably measurable at least my environment but
> I believe that there's no degradation theoreticaly. The test C to
> E still shows apparent gain.
> regards,

The patch conflicts with 3cac0ec. This is the rebased version.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-core-side-modification.patch	text/x-patch	29.1 KB
0003-async-postgres_fdw.patch	text/x-patch	48.0 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] asynchronous execution
Date:	2018-02-21 08:32:31
Message-ID:	20180221.173231.183249596.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At Thu, 11 Jan 2018 17:08:39 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180111(dot)170839(dot)23674040(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> At Mon, 11 Dec 2017 20:07:53 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20171211(dot)200753(dot)191768178(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> > > The attached PoC patch theoretically has no impact on the normal
> > > code paths and just brings gain in async cases.
> >
> > The parallel append just committed hit this and the attached are
> > the rebased version to the current HEAD. The result of a concise
> > performance test follows.
> >
> > patched(ms) unpatched(ms) gain(%)
> > A: simple table scan : 3562.32 3444.81 -3.4
> > B: local partitioning : 1451.25 1604.38 9.5
> > C: single remote table : 8818.92 9297.76 5.1
> > D: sharding (single con) : 5966.14 6646.73 10.2
> > E: sharding (multi con) : 1802.25 6515.49 72.3
> >
> > > A and B are degradation checks, which are expected to show no
> > > degradation. C is the gain only by postgres_fdw's command
> > > presending on a remote table. D is the gain of sharding on a
> > > connection. The number of partitions/shards is 4. E is the gain
> > > using dedicate connection per shard.
> >
> > Test A is accelerated by parallel sequential scan. Introducing
> > parallel append accelerates test B. Comparing A and B, I doubt
> > that degradation is stably measurable at least my environment but
> > I believe that there's no degradation theoreticaly. The test C to
> > E still shows apparent gain.
> > regards,
>
> The patch conflicts with 3cac0ec. This is the rebased version.

It hadn't been workable itself for a long time.

- Rebased to current master.
(Removed some wrongly-inserted lines)
- Fixed wrong-positioned assertion in postgres_fdw.c
(Caused assertion failure on normal usage)
- Properly reset persistent (static) variable.
(Caused SEGV under certain condition)
- Fixed explain output of async-mixed append plan.
(Choose proper subnode as the referent node)

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-core-side-modification.patch	text/x-patch	31.6 KB
0003-async-postgres_fdw.patch	text/x-patch	47.8 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] asynchronous execution
Date:	2018-05-11 08:45:20
Message-ID:	20180511.174520.188681124.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hello. This is the new version of $Subject.

But, this is not just a rebased version. On the way fixing
serious conflicts, I refactored patch and I believe this becomes
way readable than the previous shape.

# 0003 lacks changes of postgres_fdw.out now.

- Waiting queue manipulation is moved into new functions. It had
a bug that the same node can be inserted in the queue more than
once and it is fixed.

- postgresIterateForeginScan had somewhat a tricky strcuture to
merge similar procedures thus it cannot be said easy-to-read at
all. Now it is far simpler and straight-forward looking.

- Still this works only on Append/ForeignScan.

> > The attached PoC patch theoretically has no impact on the normal
> > code paths and just brings gain in async cases.

I performed almost the same test as before but with:

- partition tables
(There should be no difference with inheritance.)

- added test for fetch_size of 200 and 1000 as long as 100.

100 unreasonably magnifies the lag by context switching on
single poor box and the test D/F below. (They became faster by
about twice by adding a small delay (1000 times of
clock_gettime()(*1)) just before epoll_wait so that it doesn't
sleep (I suppose)...)

- Table size of test B is one tenth of the previous size, the
same to one partition.

*1: The reason for the function is that first I found that the
queries get way faster by just prefixing by "explain analyze"..

> patched(ms) unpatched(ms) gain(%)
> A: simple table scan : 3562.32 3444.81 -3.4
> B: local partitioning : 1451.25 1604.38 9.5
> C: single remote table : 8818.92 9297.76 5.1
> D: sharding (single con) : 5966.14 6646.73 10.2
> E: sharding (multi con) : 1802.25 6515.49 72.3

fetch_size = 100
patched(ms) unpatched(ms) gain(%)
A: simple table scan : 3033.48 2997.44 -1.2
B: local partitioning : 1405.52 1426.66 1.5
C: single remote table : 8335.50 8463.22 1.5
D: sharding (single con) : 6862.92 6820.97 -0.6
E: sharding (multi con) : 2185.84 6733.63 67.5
F: partition (single con): 6818.13 6741.01 -1.1
G: partition (multi con) : 2150.58 6407.46 66.4

fetch_size = 200
patched(ms) unpatched(ms) gain(%)
A: simple table scan :
B: local partitioning :
C: single remote table :
D: sharding (single con) :
E: sharding (multi con) :
F: partition (single con):
G: partition (multi con) :

fetch_size = 1000
patched(ms) unpatched(ms) gain(%)
A: simple table scan : 3050.31 2980.29 -2.3
B: local partitioning : 1401.34 1419.54 1.3
C: single remote table : 8375.4 8445.27 0.8
D: sharding (single con) : 3935.97 4737.84 16.9
E: sharding (multi con) : 1330.44 4752.87 72.0
F: partition (single con): 3997.63 4747.44 15.8
G: partition (multi con) : 1323.02 4807.72 72.5

Async append doesn't affect non-async path at all so B is
expected to get no degradation. It seems within error.

C and F are the gain when all foreign tables share one connection
and D and G are the gain when every foreign tables has dedicate
connection.

I will repost after filling the blank portion of the tables and
complete regression of the patch next week. Sorry for the
incomplete post.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-core-side-modification.patch	text/x-patch	47.8 KB
0003-async-postgres_fdw.patch	text/x-patch	48.7 KB

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	robertmhaas(at)gmail(dot)com
Cc:	tgl(at)sss(dot)pgh(dot)pa(dot)us, ah(at)cybertec(dot)at, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] asynchronous execution
Date:	2018-05-15 11:29:45
Message-ID:	20180515.202945.69332784.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This gets further refactoring.

At Fri, 11 May 2018 17:45:20 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote in <20180511(dot)174520(dot)188681124(dot)horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
> But, this is not just a rebased version. On the way fixing
> serious conflicts, I refactored patch and I believe this becomes
> way readable than the previous shape.
>
> - Waiting queue manipulation is moved into new functions. It had
> a bug that the same node can be inserted in the queue more than
> once and it is fixed.
>
> - postgresIterateForeginScan had somewhat a tricky strcuture to
> merge similar procedures thus it cannot be said easy-to-read at
> all. Now it is far simpler and straight-forward looking.
>
> - Still this works only on Append/ForeignScan.

I performed almost the same test (again) as before but with some
new things:

- partition tables (There should be no difference with
inheritance and it actually looks so.)

- added test for fetch_size of 200 and 1000 as long as 100.

Fetch size of 100 seems unreasonably magnifies the lag by
context switching on single poor box for the test D/F
below. They became faster by about twice by *adding* a small
delay (1000 times of clock_gettime()(*1)) just before
epoll_wait. Things would be different on separate machines but
I'm not sure it really is. I don't find the exact cause nor how
to avoid it.

*1: The reason for the function is that I found at first that the
queries get way faster by just prefixing by "explain
analyze"..

Async append (theoretically) no longer affects non-async path at
all so B is expected to get no degradation. It seems within
error.

C and F are the gain when all foreign tables share one connection
and D and G are the gain when every foreign tables has dedicate
connection.

(previous numbers)
> patched(ms) unpatched(ms) gain(%)
> A: simple table scan : 3562.32 3444.81 -3.4
> B: local partitioning : 1451.25 1604.38 9.5
> C: single remote table : 8818.92 9297.76 5.1
> D: sharding (single con) : 5966.14 6646.73 10.2
> E: sharding (multi con) : 1802.25 6515.49 72.3

fetch_size = 100
patched(ms) unpatched(ms) gain(%)
A: simple table scan : 3065.82 3046.82 -0.62
B: local partitioning : 1393.98 1378.00 -1.16
C: single remote table : 8499.73 8595.66 1.12
D: sharding (single con) : 9267.85 9251.59 -0.18
E: sharding (multi con) : 2567.02 9295.22 72.38
F: partition (single con): 9241.08 9060.19 -2.00
G: partition (multi con) : 2548.86 9419.18 72.94

fetch_size = 200
patched(ms) unpatched(ms) gain(%)
A: simple table scan : 3067.08 2999.23 -2.3
B: local partitioning : 1392.07 1384.49 -0.5
C: single remote table : 8521.72 8505.48 -0.2
D: sharding (single con) : 6752.81 7076.02 4.6
E: sharding (multi con) : 1958.2 7188.02 72.8
F: partition (single con): 6756.72 7000.72 3.5
G: partition (multi con) : 1969.8 7228.85 72.8

fetch_size = 1000
patched(ms) unpatched(ms) gain(%)
A: simple table scan : 4547.44 4519.34 -0.62
B: local partitioning : 2880.66 2739.43 -5.16
C: single remote table : 8448.04 8572.15 1.45
D: sharding (single con) : 2405.01 5919.31 59.37
E: sharding (multi con) : 1872.15 5963.04 68.60
F: partition (single con): 2369.08 5960.81 60.26
G: partition (multi con) : 1854.69 5893.65 68.53

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
0001-Allow-wait-event-set-to-be-registered-to-resource-ow.patch	text/x-patch	9.4 KB
0002-infrastructure-for-asynchronous-execution.patch	text/x-patch	41.8 KB
0003-async-postgres_fdw.patch	text/x-patch	55.8 KB