Re: exec_execute_message crash

Lists: pgsql-hackers
From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: exec_execute_message crush
Date: 2009-12-29 01:06:08
Message-ID: 20091229.100608.37592217.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While inspecting a complain from a pgpool user, I found that
PostgreSQL crushes with following statck trace:

#0 0x0826436a in list_length (l=0xaabe4e28)
at ../../../src/include/nodes/pg_list.h:94
#1 0x08262168 in IsTransactionStmtList (parseTrees=0xaabe4e28)
at postgres.c:2429
#2 0x0826132e in exec_execute_message (portal_name=0x857bab0 "", max_rows=0)
at postgres.c:1824
#3 0x08263b2a in PostgresMain (argc=4, argv=0x84f6c28,
username=0x84f6b08 "t-ishii") at postgres.c:3671
#4 0x0823299e in BackendRun (port=0x8511e68) at postmaster.c:3449
#5 0x08231f78 in BackendStartup (port=0x8511e68) at postmaster.c:3063
#6 0x0822f90a in ServerLoop () at postmaster.c:1387
#7 0x0822f131 in PostmasterMain (argc=3, argv=0x84f4bf8) at postmaster.c:1040
#8 0x081c6217 in main (argc=3, argv=0x84f4bf8) at main.c:188

This happens with following extended commands sequence:

parse
bind
describe
execute
<normaly done>
parse invalid SQL thus abort a transaction
bind (error)
describe (error)
execute (crush)

exec_execute_message crushes here:

/* Does the portal contain a transaction command? */
is_xact_command = IsTransactionStmtList(portal->stmts);

Looking into portal:

$5 = {name = 0x85727bc "", prepStmtName = 0x0, heap = 0x8596798, resowner = 0x0,
cleanup = 0, createSubid = 1,
sourceText = 0x859ac78 " SELECT NULL AS TABLE_CAT, n.nspname AS TABLE_SCHEM, ct.relname AS TABLE_NAME, a.attname AS COLUMN_NAME, a.attnum AS KEY_SEQ, ci.relname AS PK_NAME FROM pg_catalog.pg_namespace n, pg_catalog.pg_c"...,
commandTag = 0x84682aa "SELECT", stmts = 0xaabe4e28, cplan = 0x0,
portalParams = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 4,
status = PORTAL_READY, queryDesc = 0x0, tupDesc = 0x85db060,
formats = 0x859b0c8, holdStore = 0x0, holdContext = 0x0, atStart = 1 '\001',
atEnd = 1 '\001', posOverflow = 0 '\0', portalPos = 0,
creation_time = 315313855337710, visible = 1 '\001'}

Problem is, stmts points to invalid memory address:

(gdb) p *portal->stmts
Cannot access memory at address 0xaabe4e28

It seems the source of the problem is, exec_execute_message tries to
execute unamed portal which has unnamed statement which has already
gone.

Please note that without pgpool backend does not crush. This is
because JDBC driver does not do execute() if prior parse, bind
etc. failed, I think.

The crush happens PostgreSQL 8.3.8, 8.3.9 and 8.4.2.

Any thought?
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 01:36:37
Message-ID: 5190.1262050597@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> It seems the source of the problem is, exec_execute_message tries to
> execute unamed portal which has unnamed statement which has already
> gone.

Could we see an actual test case?

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 01:52:30
Message-ID: 20091229.105230.96920423.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> > It seems the source of the problem is, exec_execute_message tries to
> > execute unamed portal which has unnamed statement which has already
> > gone.
>
> Could we see an actual test case?

If you don't mind to use pgpool, it would be possible. If not, I need
to write a small program which handles frontend/backend protocol
directly. What shall I do?
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 02:30:04
Message-ID: 5747.1262053804@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
>> Could we see an actual test case?

> If you don't mind to use pgpool, it would be possible. If not, I need
> to write a small program which handles frontend/backend protocol
> directly. What shall I do?

Hm, can't you get libpq to do it?

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 02:35:52
Message-ID: 20091229.113552.48667790.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > If you don't mind to use pgpool, it would be possible. If not, I need
> > to write a small program which handles frontend/backend protocol
> > directly. What shall I do?
>
> Hm, can't you get libpq to do it?

That depends on how libpq is "intelligent":-) Let me try...

Another idea is a "packet recorder", which could record packets from
pgpool to PostgreSQL and replay them. I don't remember at present, but
I vaguely recall something like that exists.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 03:47:55
Message-ID: 20091229.124755.51301287.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > Hm, can't you get libpq to do it?
>
> That depends on how libpq is "intelligent":-) Let me try...
>
> Another idea is a "packet recorder", which could record packets from
> pgpool to PostgreSQL and replay them. I don't remember at present, but
> I vaguely recall something like that exists.

It seems we can't get libpq to do it. libpq does not provide a
function which can execute bind alone. In my understanding
PQexecPrepared does bind + execute.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 04:02:20
Message-ID: 9290.1262059340@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
>>> Hm, can't you get libpq to do it?

> It seems we can't get libpq to do it. libpq does not provide a
> function which can execute bind alone. In my understanding
> PQexecPrepared does bind + execute.

The event sequence you mentioned had bind followed by execute, so
I'm not seeing the problem.

(In any case, some kind of quick lobotomy in libpq would be easier
than writing a standalone test program, no?)

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 04:07:43
Message-ID: 20091229.130743.00076757.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> (In any case, some kind of quick lobotomy in libpq would be easier
> than writing a standalone test program, no?)

Sounds nice idea.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Kris Jurka <books(at)ejurka(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 05:49:19
Message-ID: alpine.BSO.2.00.0912290043330.25395@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 29 Dec 2009, Tatsuo Ishii wrote:

> parse
> bind
> describe
> execute
> <normaly done>
> parse invalid SQL thus abort a transaction
> bind (error)
> describe (error)
> execute (crush)
>
> Please note that without pgpool backend does not crush. This is
> because JDBC driver does not do execute() if prior parse, bind
> etc. failed, I think.

The JDBC driver will fire away parse, bind, and execute all at once before
a sync, to avoid network roundtrips, so your assumption of what's going on
here without pgpool doesn't seem accurate. Attached is a test case that
tries to duplicate what you've described and it errors out normally.
Below is the JDBC driver's protocol level logging.

21:41:39.407 (1) FE=> Parse(stmt=S_1,query="BEGIN",oids={})
21:41:39.407 (1) FE=> Bind(stmt=S_1,portal=null)
21:41:39.407 (1) FE=> Execute(portal=null,limit=0)
21:41:39.408 (1) FE=> Parse(stmt=null,query="SELECT $1 ",oids={23})
21:41:39.408 (1) FE=> Bind(stmt=null,portal=null,$1=<'1'>)
21:41:39.408 (1) FE=> Describe(portal=null)
21:41:39.408 (1) FE=> Execute(portal=null,limit=0)
21:41:39.408 (1) FE=> Parse(stmt=null,query=" SELECT SELECT $1
",oids={23})
21:41:39.408 (1) FE=> Bind(stmt=null,portal=null,$1=<'2'>)
21:41:39.409 (1) FE=> Describe(portal=null)
21:41:39.409 (1) FE=> Execute(portal=null,limit=0)
21:41:39.409 (1) FE=> Sync
21:41:39.443 (1) <=BE ParseComplete [S_1]
21:41:39.443 (1) <=BE BindComplete [null]
21:41:39.443 (1) <=BE CommandStatus(BEGIN)
21:41:39.443 (1) <=BE ParseComplete [null]
21:41:39.443 (1) <=BE BindComplete [null]
21:41:39.444 (1) <=BE RowDescription(1)
21:41:39.444 (1) <=BE DataRow
21:41:39.444 (1) <=BE CommandStatus(SELECT)
21:41:39.454 (1) <=BE ErrorMessage(ERROR: syntax error at or near
"SELECT"
Position: 9)

So this shows everything working as expected. Perhaps enabling this
logging on your JDBC client would show more clearly what it is trying to
do.

Kris Jurka

Attachment Content-Type Size
Crash.java text/plain 453 bytes

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: books(at)ejurka(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crush
Date: 2009-12-29 06:22:31
Message-ID: 20091229.152231.68307810.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > parse
> > bind
> > describe
> > execute
> > <normaly done>
> > parse invalid SQL thus abort a transaction
> > bind (error)
> > describe (error)
> > execute (crush)
> >
> > Please note that without pgpool backend does not crush. This is
> > because JDBC driver does not do execute() if prior parse, bind
> > etc. failed, I think.
>
> The JDBC driver will fire away parse, bind, and execute all at once before
> a sync, to avoid network roundtrips, so your assumption of what's going on
> here without pgpool doesn't seem accurate. Attached is a test case that
> tries to duplicate what you've described and it errors out normally.
> Below is the JDBC driver's protocol level logging.
>
> 21:41:39.407 (1) FE=> Parse(stmt=S_1,query="BEGIN",oids={})
> 21:41:39.407 (1) FE=> Bind(stmt=S_1,portal=null)
> 21:41:39.407 (1) FE=> Execute(portal=null,limit=0)
> 21:41:39.408 (1) FE=> Parse(stmt=null,query="SELECT $1 ",oids={23})
> 21:41:39.408 (1) FE=> Bind(stmt=null,portal=null,$1=<'1'>)
> 21:41:39.408 (1) FE=> Describe(portal=null)
> 21:41:39.408 (1) FE=> Execute(portal=null,limit=0)
> 21:41:39.408 (1) FE=> Parse(stmt=null,query=" SELECT SELECT $1
> ",oids={23})
> 21:41:39.408 (1) FE=> Bind(stmt=null,portal=null,$1=<'2'>)
> 21:41:39.409 (1) FE=> Describe(portal=null)
> 21:41:39.409 (1) FE=> Execute(portal=null,limit=0)
> 21:41:39.409 (1) FE=> Sync
> 21:41:39.443 (1) <=BE ParseComplete [S_1]
> 21:41:39.443 (1) <=BE BindComplete [null]
> 21:41:39.443 (1) <=BE CommandStatus(BEGIN)
> 21:41:39.443 (1) <=BE ParseComplete [null]
> 21:41:39.443 (1) <=BE BindComplete [null]
> 21:41:39.444 (1) <=BE RowDescription(1)
> 21:41:39.444 (1) <=BE DataRow
> 21:41:39.444 (1) <=BE CommandStatus(SELECT)
> 21:41:39.454 (1) <=BE ErrorMessage(ERROR: syntax error at or near
> "SELECT"
> Position: 9)
>
> So this shows everything working as expected. Perhaps enabling this
> logging on your JDBC client would show more clearly what it is trying to
> do.

Thanks for clarification. I will look into more between pgpool and
PostgreSQL packet exchange.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crash
Date: 2009-12-30 12:46:35
Message-ID: 20091230.214635.09776596.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> While inspecting a complain from a pgpool user, I found that
> PostgreSQL crushes with following statck trace:
>
> #0 0x0826436a in list_length (l=0xaabe4e28)
> at ../../../src/include/nodes/pg_list.h:94
> #1 0x08262168 in IsTransactionStmtList (parseTrees=0xaabe4e28)
> at postgres.c:2429
> #2 0x0826132e in exec_execute_message (portal_name=0x857bab0 "", max_rows=0)
> at postgres.c:1824
> #3 0x08263b2a in PostgresMain (argc=4, argv=0x84f6c28,
> username=0x84f6b08 "t-ishii") at postgres.c:3671
> #4 0x0823299e in BackendRun (port=0x8511e68) at postmaster.c:3449
> #5 0x08231f78 in BackendStartup (port=0x8511e68) at postmaster.c:3063
> #6 0x0822f90a in ServerLoop () at postmaster.c:1387
> #7 0x0822f131 in PostmasterMain (argc=3, argv=0x84f4bf8) at postmaster.c:1040
> #8 0x081c6217 in main (argc=3, argv=0x84f4bf8) at main.c:188

Ok, I think I understand what's going on.

parse
bind
describe
execute

This sequence of commands create cached plan in unnamed portal.

$5 = {name = 0x8574de4 "", prepStmtName = 0x0, heap = 0x8598400,
resowner = 0x8598488, cleanup = 0x81632ca <PortalCleanup>, createSubid = 1,
sourceText = 0x85ab818 " SELECT <omitted>"...,
commandTag = 0x84682ca "SELECT", stmts = 0xaabf43b0, cplan = 0xaabf4950,
portalParams = 0x0, strategy = PORTAL_ONE_SELECT, cursorOptions = 4,
status = PORTAL_READY, queryDesc = 0x85abc20, tupDesc = 0x85ddcb0,
formats = 0x85abc68, holdStore = 0x0, holdContext = 0x0, atStart = 1 '\001',
atEnd = 1 '\001', posOverflow = 0 '\0', portalPos = 0,
creation_time = 315487957498169, visible = 1 '\001'}

The cached plan(portal->cplan) and statements(portal->stmts) are
created by exec_bind_message():

/*
* Revalidate the cached plan; this may result in replanning. Any
* cruft will be generated in MessageContext. The plan refcount will
* be assigned to the Portal, so it will be released at portal
* destruction.
*/
cplan = RevalidateCachedPlan(psrc, false);
plan_list = cplan->stmt_list;

Please note that cplan and stmts belong to the same memory context.

Then following commands are coming:

parse invalid SQL thus abort a transaction
bind (error)
describe (error)
execute (crash)

parse causes transaction to abort, which causes call to
AbortCurrentTransaction->AbortTransaction->AtAbort_portals->ReleaseCachedPlan. It
calls ReleaseCachePlan(portal->cplan). ReleaseCachePlan calls
MemoryContextDelete(plan->context) which destroys both portal->cplan
and portal->stmts.

That was the reason why I had segfault by accessing portal->stmts.

To fix this I think exec_execute_message should throw an error if
portal->cleanup is NULL, since portal->cleanup is NULLed by
AtAbort_Portals at transaction abort (or portal is dropped).

Here is a suggested fix:

diff -c postgres.c~ postgres.c
*** postgres.c~ 2009-06-18 19:08:08.000000000 +0900
--- postgres.c 2009-12-30 21:34:49.000000000 +0900
***************
*** 1804,1810 ****
dest = DestRemoteExecute;

portal = GetPortalByName(portal_name);
! if (!PortalIsValid(portal))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_CURSOR),
errmsg("portal \"%s\" does not exist", portal_name)));
--- 1804,1810 ----
dest = DestRemoteExecute;

portal = GetPortalByName(portal_name);
! if (!PortalIsValid(portal) || (PortalIsValid(portal) && portal->cleanup == NULL))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_CURSOR),
errmsg("portal \"%s\" does not exist", portal_name)));

--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crash
Date: 2009-12-30 12:59:25
Message-ID: 4B3B4EAD.5090001@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii wrote:
> ! if (!PortalIsValid(portal) || (PortalIsValid(portal) && portal->cleanup == NULL))
>
>

Surely the second call to PortalIsValid() is redundant.

if (( !PortalIsValid(portal)) || portal->cleanup == NULL)

should do it, no?

cheers

andrew


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: andrew(at)dunslane(dot)net
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crash
Date: 2009-12-30 14:26:00
Message-ID: 20091230.232600.89610118.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Tatsuo Ishii wrote:
> > ! if (!PortalIsValid(portal) || (PortalIsValid(portal) && portal->cleanup == NULL))
> >
> >
>
>
> Surely the second call to PortalIsValid() is redundant.
>
> if (( !PortalIsValid(portal)) || portal->cleanup == NULL)
>
> should do it, no?

Oops. You are right.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crash
Date: 2009-12-30 16:51:10
Message-ID: 27108.1262191870@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> parse causes transaction to abort, which causes call to
> AbortCurrentTransaction->AbortTransaction->AtAbort_portals->ReleaseCachedPlan. It
> calls ReleaseCachePlan(portal->cplan). ReleaseCachePlan calls
> MemoryContextDelete(plan->context) which destroys both portal->cplan
> and portal->stmts.

> That was the reason why I had segfault by accessing portal->stmts.

> To fix this I think exec_execute_message should throw an error if
> portal->cleanup is NULL, since portal->cleanup is NULLed by
> AtAbort_Portals at transaction abort (or portal is dropped).

This is just a kluge, and a rather bad one I think. The real problem
here is that AtAbort_Portals destroys the portal contents and doesn't
do anything to record the fact. It should probably be putting the
portal into PORTAL_FAILED state, and what exec_execute_message ought
to be doing is checking for that. It might be a good idea to explicitly
zero out the now-dangling pointers in the Portal struct, too.

It'd be nice to have a test case for this, hint hint ...

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crash
Date: 2009-12-31 01:48:48
Message-ID: 20091231.104848.34368524.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> This is just a kluge, and a rather bad one I think. The real problem
> here is that AtAbort_Portals destroys the portal contents and doesn't
> do anything to record the fact. It should probably be putting the
> portal into PORTAL_FAILED state, and what exec_execute_message ought
> to be doing is checking for that.

Yeah I thought about that too. in AtAbort_Portals:

--------------------------------------------------------------------------
/*
* Abort processing for portals.
*
* At this point we reset "active" status and run the cleanup hook if
* present, but we can't release the portal's memory until the cleanup call.
*
* The reason we need to reset active is so that we can replace the unnamed
* portal, else we'll fail to execute ROLLBACK when it arrives.
*/
void
AtAbort_Portals(void)
{
HASH_SEQ_STATUS status;
PortalHashEnt *hentry;

hash_seq_init(&status, PortalHashTable);

while ((hentry = (PortalHashEnt *) hash_seq_search(&status)) != NULL)
{
Portal portal = hentry->portal;

if (portal->status == PORTAL_ACTIVE)
portal->status = PORTAL_FAILED;
--------------------------------------------------------------------------

Should I change the last if clause to?

if (portal->status == PORTAL_ACTIVE || portal->status == PORTAL_READY)
portal->status = PORTAL_FAILED;

> zero out the now-dangling pointers in the Portal struct, too.

portal->cplan is already zero out by PortalReleaseCachedPlan. Problem
is, portal->stmts may belong to PortalContext or others (in this
particluar case). So if we want to zero out portal->stmts, we need to
memorize the memory context which it belongs to and we need add a new
struct member to portal. I'm afraid this is an overkill...

> It'd be nice to have a test case for this, hint hint ...

Still working on...
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crash
Date: 2009-12-31 06:56:11
Message-ID: 20091231.155611.128866384.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > It'd be nice to have a test case for this, hint hint ...
>
> Still working on...

Done. Inclded are C test program along with modified fe-exec.c.

The modification made to fe-exec.c is sending Sync after Parse, Bind
and Describe. Pgpool-II does this in order to get current transaction
status.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Attachment Content-Type Size
unknown_filename text/plain 1.7 KB
unknown_filename text/plain 77.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crash
Date: 2009-12-31 23:37:52
Message-ID: 20349.1262302672@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)postgresql(dot)org> writes:
> Done. Inclded are C test program along with modified fe-exec.c.

> The modification made to fe-exec.c is sending Sync after Parse, Bind
> and Describe. Pgpool-II does this in order to get current transaction
> status.

I tried this but didn't have any luck crashing the backend. libpq gets
tremendously confused by the extra ReadyForQuery responses, which is
unsurprising. The postmaster log shows

LOG: could not send data to client: Broken pipe
ERROR: relation "foo" does not exist at character 15
STATEMENT: SELECT * FROM foo
ERROR: unnamed prepared statement does not exist
ERROR: current transaction is aborted, commands ignored until end of transaction block
ERROR: current transaction is aborted, commands ignored until end of transaction block
STATEMENT: SELECT NULL , n.nspname, ct.relname, a.attname, a.attnum, ci.relname FROM pg_catalog.pg_namespace n, pg_catalog.pg_class ct, pg_catalog.pg_class ci, pg_catalog.pg_attribute a, pg_catalog.pg_index i WHERE ct.oid=i.indrelid AND ci.oid=i.indexrelid AND a.attrelid=ci.oid AND i.indisprimary AND ct.relname = 'mst_Ucompany_feature_setting' AND ct.relnamespace = n.oid AND n.nspname = 'foo' ORDER BY 1, 2, 3

So the "unnamed prepared statement does not exist" bit seems to be
related to what you are talking about, but it doesn't actually fail.

regards, tom lane


From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: exec_execute_message crash
Date: 2010-01-03 12:00:33
Message-ID: 20100103.210033.38086058.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I tried this but didn't have any luck crashing the backend. libpq gets
> tremendously confused by the extra ReadyForQuery responses, which is
> unsurprising. The postmaster log shows
>
> LOG: could not send data to client: Broken pipe
> ERROR: relation "foo" does not exist at character 15
> STATEMENT: SELECT * FROM foo
> ERROR: unnamed prepared statement does not exist
> ERROR: current transaction is aborted, commands ignored until end of transaction block
> ERROR: current transaction is aborted, commands ignored until end of transaction block
> STATEMENT: SELECT NULL , n.nspname, ct.relname, a.attname, a.attnum, ci.relname FROM pg_catalog.pg_namespace n, pg_catalog.pg_class ct, pg_catalog.pg_class ci, pg_catalog.pg_attribute a, pg_catalog.pg_index i WHERE ct.oid=i.indrelid AND ci.oid=i.indexrelid AND a.attrelid=ci.oid AND i.indisprimary AND ct.relname = 'mst_Ucompany_feature_setting' AND ct.relnamespace = n.oid AND n.nspname = 'foo' ORDER BY 1, 2, 3
>
> So the "unnamed prepared statement does not exist" bit seems to be
> related to what you are talking about, but it doesn't actually fail.

I have put some debugging codes to make sure that portal->cplan and
portal->stmts belong to the same memory context by calling
GetMemoryChunkContext and surely they did. It appears that the memory
was surely deleted by MemeoryContextDelete in ReleaseCachedPlan. Also
I defined CLOBBER_FREED_MEMORY in aset.c to fill with 0x7f the freed
memory. Strange thing was portal->smts was not clobbered by 0x7f.
It seems I have missed something here...
--
Tatsuo Ishii
SRA OSS, Inc. Japan