Re: Extensible executor nodes for preparation of SQL/MED

Lists: pgsql-hackers
From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Extensible executor nodes for preparation of SQL/MED
Date: 2010-10-25 11:16:49
Message-ID: AANLkTim+F5uUEyS0OZxjxQ=3B6g2iwSnqXy+qmVHjCX-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

SQL/MED will have some kinds of planner hooks to support FDW-depending
plan execution. Then, we will need to support user-defined executor nodes.
The proposed SQL/MED has own "executor node hooks" in ForeignTableScan,
http://wiki.postgresql.org/wiki/SQL/MED#Executor
but I think it will be cleaner to support it in executor level.

The attached patch is an experimental code to do it; Plan struct has
"vtable" field as a set of functions to generate and execute PlanState
nodes. It changes large switch-case blocks in the current executor
into function-pointer calls as like as virtual functions in C++.

Is it worth doing? If we will go to the direction, I'll continue to
research it, like extensibility of Path nodes and EXPLAIN support.

-------- Essence of the patch --------
typedef struct Plan
{
NodeTag type;
PlanVTable *vtable; /* executor procs */
...

struct PlanVTable
{
ExecInitNode_type InitNode;
ExecProcNode_type ProcNode;
MultiProcNode_type MultiProcNode;
ExecEndNode_type EndNode;
...

make_seqscan()
{
node = makeNode(SeqScan);
node->vtable = &SeqScanVTable;
...

ExecReScan(node)
{
node->plan->vtable->ReScan(node);
...

--------

--
Itagaki Takahiro

Attachment Content-Type Size
extensible_execnodes-20101025.patch.gz application/x-gzip 12.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-10-25 15:28:16
Message-ID: 8577.1288020496@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> writes:
> SQL/MED will have some kinds of planner hooks to support FDW-depending
> plan execution. Then, we will need to support user-defined executor nodes.
> The proposed SQL/MED has own "executor node hooks" in ForeignTableScan,
> http://wiki.postgresql.org/wiki/SQL/MED#Executor
> but I think it will be cleaner to support it in executor level.

I think the argument that this is good for FDW is bogus: there is
no evidence whatsoever that we need add-on plan node types, and if
we did need them, we'd need a whole lot more infrastructure than
what you're sketching (see EXPLAIN for instance, not to mention
how will the planner generate them in the first place).

But it might be a good change anyway from a performance standpoint,
in case a call through a function pointer is faster than a big switch.
Have you tried benchmarking it on common platforms?

One comment is that as sketched, this requires two extra levels of
indirection at runtime, for no particular benefit that I can see.
It'd be better to put the function pointers right in the planstate
nodes, at least for the most common case of ExecProcNode.

regards, tom lane


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-10-26 03:21:32
Message-ID: AANLkTimapwCybA_5WMNZmATR6jA1czQpbNz4B+7mKQoP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 26, 2010 at 12:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> But it might be a good change anyway from a performance standpoint,
> in case a call through a function pointer is faster than a big switch.
> Have you tried benchmarking it on common platforms?

I didn't intend performance, but there is small but measurable win
in it if I avoided indirections. We might not always need to copy
a whole vtable into planstate; only ExecProcNode might be enough.
I'll continue the research.

24957.767 ms : master (a big switch)
25059.838 ms : two indirections (planstate->plan->vtable->fn)
24819.298 ms : one indirection (planstate->plan->vtable.fn)
24118.436 ms : direct call (planstate->vtable.fn)

So, major benefits of the change might be performance and code refactoring.
Does anyone have comments about it for the functionality? It might also be
used by SQL/MED and executor hooks, but I have no specific idea yet.

[measuring settings and queries]
=# SHOW shared_buffers;
shared_buffers
----------------
512MB

=# CREATE TABLE tbl AS SELECT i FROM generate_series(1, 10000000) AS t(i);
=# VACUUM ANALYZE tbl;
=# SELECT count(*), pg_size_pretty(pg_relation_size('tbl')) FROM tbl;
=# CREATE FUNCTION test(n integer) RETURNS void LANGUAGE plpgsql AS
$$
DECLARE
i integer;
r bigint;
BEGIN
FOR i IN 1..n LOOP
SELECT count(*) INTO r FROM tbl;
END LOOP;
END;
$$;

=# \timing
=# SELECT test(30);

--
Itagaki Takahiro


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-10-26 09:23:37
Message-ID: AANLkTikF7A4WXw+GTjVaG9PqFs0sC7rokwgfWGmviS2Z@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a WIP patch to extensible executor nodes.

It replaces large switch blocks with function-pointer calls (renamed to
PlanProcs from VTable). It has small performance win (Please test it!)
and capsulize some details of executor nodes, but is not perfect.

The infrastructure might be used by SQL/MED, but then not only PlanState
but also Plan and Path are required to be customizable. Complete cleanup
would be difficult, but I'm trying to find common ground for existing
postgres' implementation and extensible planner and executor.
Comments and suggestions welcome.

On Tue, Oct 26, 2010 at 12:21 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> I didn't intend performance, but there is small but measurable win
> in it if I avoided indirections. We might not always need to copy
> a whole vtable into planstate; only ExecProcNode might be enough.
> I'll continue the research.
>
> 24957.767 ms : master (a big switch)
> 25059.838 ms : two indirections (planstate->plan->vtable->fn)
> 24819.298 ms : one indirection (planstate->plan->vtable.fn)
> 24118.436 ms : direct call (planstate->vtable.fn)
>
> So, major benefits of the change might be performance and code refactoring.
> Does anyone have comments about it for the functionality? It might also be
> used by SQL/MED and executor hooks, but I have no specific idea yet.

--
Itagaki Takahiro

Attachment Content-Type Size
extensible_execnodes-20101026.patch.gz application/x-gzip 12.6 KB

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-10-26 18:54:47
Message-ID: AANLkTin7kZ-VCzANv3Jc-6Vyw=44hVgNwniNZyT47-bS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 8:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> But it might be a good change anyway from a performance standpoint,
> in case a call through a function pointer is faster than a big switch.
> Have you tried benchmarking it on common platforms?

I've always wondered why we didn't use function pointers. It seems
like it would make the code a lot easier to maintain with fewer places
that need to be updated every time we add a node.

But I always assumed a big part of the reason was performance.
Generally compilers hate optimizing code using function pointers. They
pretty much kill any inter-procedural optimization since it's very
hard to figure out what set of functions you might have called and
make any deductions about what side effects those functions might or
might not have had. Even at the chip level function pointers tend to
be slow since they cause full pipeline stalls where the processor has
no idea where the next instruction is coming from until it's finished
loading the data from the previous instruction.

On the other hand of course it's not obvious what algorithm gcc should
use to implement largish switch statements like these. It might be
doing a fairly large number of operations doing a binary search or
hash lookup to determine which branch to take.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-10-26 19:18:53
Message-ID: 1067.1288120733@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Mon, Oct 25, 2010 at 8:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> But it might be a good change anyway from a performance standpoint,
>> in case a call through a function pointer is faster than a big switch.
>> Have you tried benchmarking it on common platforms?

> I've always wondered why we didn't use function pointers. It seems
> like it would make the code a lot easier to maintain with fewer places
> that need to be updated every time we add a node.

> But I always assumed a big part of the reason was performance.
> Generally compilers hate optimizing code using function pointers. They
> pretty much kill any inter-procedural optimization since it's very
> hard to figure out what set of functions you might have called and
> make any deductions about what side effects those functions might or
> might not have had. Even at the chip level function pointers tend to
> be slow since they cause full pipeline stalls where the processor has
> no idea where the next instruction is coming from until it's finished
> loading the data from the previous instruction.

At least in the case of the plan-node-related switches, the called
functions tend to be big and ugly enough that it's really hard to credit
any meaningful inter-procedural optimization could happen. Side effects
would have to be "pretty much everything".

> On the other hand of course it's not obvious what algorithm gcc should
> use to implement largish switch statements like these. It might be
> doing a fairly large number of operations doing a binary search or
> hash lookup to determine which branch to take.

I confess to not having actually examined the assembly code anytime
recently, but I'd always supposed it would be an array-lookup anytime
the set of case labels was reasonably dense, which it should be in these
cases. So I'm not sure I believe the pipeline stall argument either.
Any way you slice it, there's going to be a call to a function that
is going to be really really hard to predict in advance --- unless of
course you rely on statistics like "where did I jump to the last few
times I was here", which I think modern CPUs do have the ability to
do.

But this is all just arm-waving of course. Benchmarks would be a lot
more convincing.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-11-16 18:28:50
Message-ID: 2673.1289932130@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> writes:
> Here is a WIP patch to extensible executor nodes.

I am of the opinion that a run-time-extensible set of plan node types
is merest fantasy. We will never have that, so putting in place 5%
of the infrastructure for it is a waste of time and notational
complexity.

I would support replacing the switch in ExecProcNode with a
function-pointer call on performance grounds. The implementation of
that IMHO should match what's done for runtime expression state nodes:
set up the function pointer in the InitNode function that creates the
executor state node. But the rest of this patch, particularly the
addition of vtables to plan nodes, is pointless complication.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-11-16 18:36:16
Message-ID: AANLkTikymQyOuRUySSL24gOgey7i=W5ENR_BnjeTN7s9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 16, 2010 at 1:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com> writes:
>> Here is a WIP patch to extensible executor nodes.
>
> I am of the opinion that a run-time-extensible set of plan node types
> is merest fantasy.  We will never have that, so putting in place 5%
> of the infrastructure for it is a waste of time and notational
> complexity.

I think I agree; and moreover there's been no compelling argument made
why we would need that for SQL/MED anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-11-17 01:51:14
Message-ID: AANLkTinjYzPveoXhG68m+U5E=eXP8hBZ6Freftcn3Aym@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 17, 2010 at 03:36, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Nov 16, 2010 at 1:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I am of the opinion that a run-time-extensible set of plan node types
>> is merest fantasy.  We will never have that, so putting in place 5%
>> of the infrastructure for it is a waste of time and notational
>> complexity.
>
> I think I agree; and moreover there's been no compelling argument made
> why we would need that for SQL/MED anyway.

I see. I'll cut useless parts from my patch.

--
Itagaki Takahiro


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-11-17 07:13:36
Message-ID: AANLkTinD-SJ5iou0Ap6SKWdP9W+sPzeXhjuDWM+Rawuj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 17, 2010 at 10:51, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Wed, Nov 17, 2010 at 03:36, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Nov 16, 2010 at 1:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I am of the opinion that a run-time-extensible set of plan node types
>>> is merest fantasy.  We will never have that, so putting in place 5%
>>> of the infrastructure for it is a waste of time and notational
>>> complexity.
>>
>> I think I agree; and moreover there's been no compelling argument made
>> why we would need that for SQL/MED anyway.
>
> I see. I'll cut useless parts from my patch.

I tested simplified version, but I cannot see measurable performance
improvement at this time. So, I'll turn down the whole proposal
to use function pointer calls. I'm sorry for all the fuss.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Extensible executor nodes for preparation of SQL/MED
Date: 2010-11-17 17:18:04
Message-ID: AANLkTikOwm+16ef+132txwRR=6DjW12AkC0hJL=HO+Wc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 17, 2010 at 2:13 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Wed, Nov 17, 2010 at 10:51, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> On Wed, Nov 17, 2010 at 03:36, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Tue, Nov 16, 2010 at 1:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>> I am of the opinion that a run-time-extensible set of plan node types
>>>> is merest fantasy.  We will never have that, so putting in place 5%
>>>> of the infrastructure for it is a waste of time and notational
>>>> complexity.
>>>
>>> I think I agree; and moreover there's been no compelling argument made
>>> why we would need that for SQL/MED anyway.
>>
>> I see. I'll cut useless parts from my patch.
>
> I tested simplified version, but I cannot see measurable performance
> improvement at this time. So, I'll turn down the whole proposal
> to use function pointer calls. I'm sorry for all the fuss.

Wait a minute... I'm confused. Didn't you have a measurable
performance improvement with an earlier version of this patch? If
taking out the "useless" parts removed the performance benefit, maybe
they weren't useless?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company