ECPG patch to use prepare for improved performance

Lists: pgsql-patches
From: "William Lawrance" <bill(dot)lawrance(at)bull(dot)com>
To: "Pgsql-Patches" <pgsql-patches(at)postgresql(dot)org>
Subject: ECPG patch to use prepare for improved performance
Date: 2007-05-07 21:46:29
Message-ID: BNEIKJMOJGCEDBNCBHEGAEGECFAA.bill.lawrance@bull.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared"
functions to cause SQL statements from ECPG to be cached. It does
this without requiring any changes in the user's source program.

It was developed during the preparation for a benchmark for a
large customer. This benchmark consists of several hundred programs
containing several thousand embedded SQL statements. The benchmark
has been successfully executed using Oracle, DB2, and PostgreSQL.
In the benchmark, Postgres is shown to be slower, by far, than the
other DBMS systems. In a three hour execution, using this patch,
approximately 30% was saved.

The following approach is used:

Within the "execute.c" module, routines are added to manage a cache
of prepared statements. These routines are used to search, insert,
and delete entries in the cache. The key for these cache entries is
the text of the SQL statement as passed by ECPG from the application
program.

Within the same module, the "ECPGexecute" function was replaced.
This is the function that is called to execute a statement after
some preliminary housekeeping is done. The original "ECPGexecute"
function constructs an ASCII string by replacing each host variable
with its current value and then calling "PQexec". The new
"ECPGexecute" function does the following:

- build an array of the current values of the host variables.

- search the cache for an entry indicating that this statement
has already been prepare'd, via "PQprepare"

- If no entry was found in the previous step, call "PQprepare"
for the statement and then insert an entry for it into the
cache. If this requires an entry to be re-used, execute a
"DEALLOCATE PREPARE.." for the previous contents.

- At this point, the SQL statement has been prepare'd by PQlib,
either when the statement was executed in the past, or in
the previous step.

- call "PQexecPrepared", using the array of parameters built
in the first step above.

Attachment Content-Type Size
ecpg.patch application/octet-stream 26.4 KB

From: Michael Meskes <meskes(at)postgresql(dot)org>
To: William Lawrance <bill(dot)lawrance(at)bull(dot)com>
Cc: Pgsql-Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: ECPG patch to use prepare for improved performance
Date: 2007-05-09 09:04:03
Message-ID: 20070509090403.GB18105@feivel.credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Mon, May 07, 2007 at 02:46:29PM -0700, William Lawrance wrote:
> This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared"
> functions to cause SQL statements from ECPG to be cached. It does
> this without requiring any changes in the user's source program.
> ...

I still do not understand why you prepare each statement. This might
help you with your test case, but I don't like to add this as a general
rule. If a user wants a prepared statement he/she should use the prepare
statement. I agree that the prepare logic has to be rewritten and this
is high on my agenda, but I will probably only do this for statements
issued with EXEC SQL PREPARE not for every single statement.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes(at)jabber(dot)org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!


From: "William Lawrance" <bill(dot)lawrance(at)bull(dot)com>
To: "Michael Meskes" <meskes(at)postgresql(dot)org>
Cc: "Pgsql-Patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: ECPG patch to use prepare for improved performance
Date: 2007-05-09 20:12:17
Message-ID: BNEIKJMOJGCEDBNCBHEGCEHJCFAA.bill.lawrance@bull.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

This approach was used for several reasons--

1. No changes were required in the application source program. For
an application involving thousands of SQL statements in hundreds
of programs, this is important. This customer application has
been tuned extensively by the customer for DB2, and he is not
receptive to large changes.

2. The performance was improved by about 1 hour in the 3 hour
elapsed time of the application. This is important to the
customer in terms of accomplishing his work load in the
time that has been allotted, based on his experience with DB2.
Without this improvement, he is likely to consider it too slow.

I would like to emphasize that we aren't measuring an artificial
test program; this is a real customer's application. We loaded
7 million rows into 217 tables to run the application. I believe
it is representative of many real batch applications.

Is there reason not to prepare each statement?

Could it be predicated upon a user supplied option ?

Other comments ?

-----Original Message-----
From: Michael Meskes [mailto:meskes(at)postgresql(dot)org]
Sent: Wednesday, May 09, 2007 2:04 AM
To: William Lawrance
Cc: Pgsql-Patches
Subject: Re: [PATCHES] ECPG patch to use prepare for improved
performance

On Mon, May 07, 2007 at 02:46:29PM -0700, William Lawrance wrote:
> This patch for ECPG utilizes the "PQprepare" and "PQexecPrepared"
> functions to cause SQL statements from ECPG to be cached. It does
> this without requiring any changes in the user's source program.
> ...

I still do not understand why you prepare each statement. This might
help you with your test case, but I don't like to add this as a general
rule. If a user wants a prepared statement he/she should use the prepare
statement. I agree that the prepare logic has to be rewritten and this
is high on my agenda, but I will probably only do this for statements
issued with EXEC SQL PREPARE not for every single statement.

Michael


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: William Lawrance <bill(dot)lawrance(at)bull(dot)com>
Cc: Michael Meskes <meskes(at)postgresql(dot)org>, Pgsql-Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: ECPG patch to use prepare for improved performance
Date: 2007-05-09 20:26:36
Message-ID: 20070509202636.GX4504@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

William Lawrance wrote:
> This approach was used for several reasons--
>
> 1. No changes were required in the application source program. For
> an application involving thousands of SQL statements in hundreds
> of programs, this is important. This customer application has
> been tuned extensively by the customer for DB2, and he is not
> receptive to large changes.
>
> 2. The performance was improved by about 1 hour in the 3 hour
> elapsed time of the application. This is important to the
> customer in terms of accomplishing his work load in the
> time that has been allotted, based on his experience with DB2.
> Without this improvement, he is likely to consider it too slow.
>
> I would like to emphasize that we aren't measuring an artificial
> test program; this is a real customer's application. We loaded
> 7 million rows into 217 tables to run the application. I believe
> it is representative of many real batch applications.
>
>
> Is there reason not to prepare each statement?

One reason is that prepared statements have the parameters passed out of
line after the planning is done, so in certain cases the optimizer makes
a different choice which leads to worse plans.

This used to be a problem with JDBC as well, until a workaround was
added so that the "unnamed" prepared statement is not planned until the
parameters are passed. If you don't do that, it may end up being a bad
choice for applications as well.

> Other comments ?

Codewise I noticed you wrote your own hashing function, which seemed odd
to me at first sight. We already have a hashing infrastructure, but I'm
not sure if it could be used in ECPG (mainly due to lack of ereport/elog
support).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Michael Meskes <meskes(at)postgresql(dot)org>
To: William Lawrance <bill(dot)lawrance(at)bull(dot)com>
Cc: Michael Meskes <meskes(at)postgresql(dot)org>, Pgsql-Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: ECPG patch to use prepare for improved performance
Date: 2007-05-10 10:00:44
Message-ID: 20070510100044.GB409@feivel.credativ.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote:
> 2. The performance was improved by about 1 hour in the 3 hour
> elapsed time of the application. This is important to the
> customer in terms of accomplishing his work load in the
> time that has been allotted, based on his experience with DB2.
> Without this improvement, he is likely to consider it too slow.

But this only holds for one customer. I don't think this will hold for
every single application. At least I do not see a reason why this
should hold everytime.

> I would like to emphasize that we aren't measuring an artificial
> test program; this is a real customer's application. We loaded
> 7 million rows into 217 tables to run the application. I believe
> it is representative of many real batch applications.

But how about non-batch applications?

> Is there reason not to prepare each statement?

I'm completely against forcing such a design decision on the programmer.
Hopefully I will be able to add a real prepare statement soon.

> Could it be predicated upon a user supplied option ?

Yes, this is fine with me. If you could rearrange the patch I will test
and commit it.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes(at)jabber(dot)org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!


From: "William Lawrance" <bill(dot)lawrance(at)bull(dot)com>
To: "Michael Meskes" <meskes(at)postgresql(dot)org>
Cc: "Pgsql-Patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: ECPG patch to use prepare for improved performance
Date: 2007-05-10 21:54:39
Message-ID: BNEIKJMOJGCEDBNCBHEGAEIECFAA.bill.lawrance@bull.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


This updated patch for ECPG uses the current routines by
default. If an environment variable (ECPGUSEPREPARE) is set
to "yes", it uses the new routine that prepares and
caches each statement.

-----Original Message-----
From: Michael Meskes [mailto:meskes(at)postgresql(dot)org]
Sent: Thursday, May 10, 2007 3:01 AM
To: William Lawrance
Cc: Michael Meskes; Pgsql-Patches
Subject: Re: [PATCHES] ECPG patch to use prepare for improved
performance

On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote:
> 2. The performance was improved by about 1 hour in the 3 hour
> elapsed time of the application. This is important to the
> customer in terms of accomplishing his work load in the
> time that has been allotted, based on his experience with DB2.
> Without this improvement, he is likely to consider it too slow.

But this only holds for one customer. I don't think this will hold for
every single application. At least I do not see a reason why this
should hold everytime.

> I would like to emphasize that we aren't measuring an artificial
> test program; this is a real customer's application. We loaded
> 7 million rows into 217 tables to run the application. I believe
> it is representative of many real batch applications.

But how about non-batch applications?

> Is there reason not to prepare each statement?

I'm completely against forcing such a design decision on the programmer.
Hopefully I will be able to add a real prepare statement soon.

> Could it be predicated upon a user supplied option ?

Yes, this is fine with me. If you could rearrange the patch I will test
and commit it.

Michael
--
Michael Meskes
Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes(at)jabber(dot)org
Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!

Attachment Content-Type Size
ecpg.patch application/octet-stream 28.8 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: William Lawrance <bill(dot)lawrance(at)bull(dot)com>
Cc: Michael Meskes <meskes(at)postgresql(dot)org>, Pgsql-Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: ECPG patch to use prepare for improved performance
Date: 2007-05-10 22:00:07
Message-ID: 464395E7.6040204@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


This seems like a very all or nothing approach. By contrast, the Perl
DBD::Pg driver lets you decide per statement if you want it
server-prepared or not. Is that not possible?

cheers

andrew

William Lawrance wrote:
> This updated patch for ECPG uses the current routines by
> default. If an environment variable (ECPGUSEPREPARE) is set
> to "yes", it uses the new routine that prepares and
> caches each statement.
>
>
>
>
> -----Original Message-----
> From: Michael Meskes [mailto:meskes(at)postgresql(dot)org]
> Sent: Thursday, May 10, 2007 3:01 AM
> To: William Lawrance
> Cc: Michael Meskes; Pgsql-Patches
> Subject: Re: [PATCHES] ECPG patch to use prepare for improved
> performance
>
>
> On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote:
>
>> 2. The performance was improved by about 1 hour in the 3 hour
>> elapsed time of the application. This is important to the
>> customer in terms of accomplishing his work load in the
>> time that has been allotted, based on his experience with DB2.
>> Without this improvement, he is likely to consider it too slow.
>>
>
> But this only holds for one customer. I don't think this will hold for
> every single application. At least I do not see a reason why this
> should hold everytime.
>
>
>> I would like to emphasize that we aren't measuring an artificial
>> test program; this is a real customer's application. We loaded
>> 7 million rows into 217 tables to run the application. I believe
>> it is representative of many real batch applications.
>>
>
> But how about non-batch applications?
>
>
>> Is there reason not to prepare each statement?
>>
>
> I'm completely against forcing such a design decision on the programmer.
> Hopefully I will be able to add a real prepare statement soon.
>
>
>> Could it be predicated upon a user supplied option ?
>>
>
> Yes, this is fine with me. If you could rearrange the patch I will test
> and commit it.
>
> Michael
>
> ------------------------------------------------------------------------
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: William Lawrance <bill(dot)lawrance(at)bull(dot)com>
Cc: Michael Meskes <meskes(at)postgresql(dot)org>, Pgsql-Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: ECPG patch to use prepare for improved performance
Date: 2007-05-13 11:45:17
Message-ID: 200705131145.l4DBjHA11220@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

William Lawrance wrote:
>
> This updated patch for ECPG uses the current routines by
> default. If an environment variable (ECPGUSEPREPARE) is set
> to "yes", it uses the new routine that prepares and
> caches each statement.
>
>
>
>
> -----Original Message-----
> From: Michael Meskes [mailto:meskes(at)postgresql(dot)org]
> Sent: Thursday, May 10, 2007 3:01 AM
> To: William Lawrance
> Cc: Michael Meskes; Pgsql-Patches
> Subject: Re: [PATCHES] ECPG patch to use prepare for improved
> performance
>
>
> On Wed, May 09, 2007 at 01:12:17PM -0700, William Lawrance wrote:
> > 2. The performance was improved by about 1 hour in the 3 hour
> > elapsed time of the application. This is important to the
> > customer in terms of accomplishing his work load in the
> > time that has been allotted, based on his experience with DB2.
> > Without this improvement, he is likely to consider it too slow.
>
> But this only holds for one customer. I don't think this will hold for
> every single application. At least I do not see a reason why this
> should hold everytime.
>
> > I would like to emphasize that we aren't measuring an artificial
> > test program; this is a real customer's application. We loaded
> > 7 million rows into 217 tables to run the application. I believe
> > it is representative of many real batch applications.
>
> But how about non-batch applications?
>
> > Is there reason not to prepare each statement?
>
> I'm completely against forcing such a design decision on the programmer.
> Hopefully I will be able to add a real prepare statement soon.
>
> > Could it be predicated upon a user supplied option ?
>
> Yes, this is fine with me. If you could rearrange the patch I will test
> and commit it.
>
> Michael
> --
> Michael Meskes
> Email: Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
> ICQ: 179140304, AIM/Yahoo: michaelmeskes, Jabber: meskes(at)jabber(dot)org
> Go SF 49ers! Go Rhein Fire! Use Debian GNU/Linux! Use PostgreSQL!
[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +