Re: pgsql_fdw, FDW for PostgreSQL server

From: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>
To: Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>
Cc: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Date: 2012-02-16 17:02:59
Message-ID: CADyhKSX4UL0OpZ_o39zwQi+oQmu2fno9RU5kQFVPinL3NN95gQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I found a strange behavior with v10. Is it available to reproduce?

In case of "ftbl" is declared as follows:
postgres=# select * FROM ftbl;
a | b
---+-----
1 | aaa
2 | bbb
3 | ccc
4 | ddd
5 | eee
(5 rows)

I tried to raise an error on remote side.

postgres=# select * FROM ftbl WHERE 100 / (a - 3) > 0;
The connection to the server was lost. Attempting reset: Failed.
The connection to the server was lost. Attempting reset: Failed.
!> \q

Its call-trace was:

(gdb) bt
#0 0x00000031030810a4 in free () from /lib64/libc.so.6
#1 0x00007f2caa620bd9 in PQclear (res=0x2102500) at fe-exec.c:679
#2 0x00007f2caa83c4db in execute_query (node=0x20f20a0) at pgsql_fdw.c:722
#3 0x00007f2caa83c64a in pgsqlIterateForeignScan (node=0x20f20a0)
at pgsql_fdw.c:402
#4 0x00000000005c120f in ForeignNext (node=0x20f20a0) at nodeForeignscan.c:50
#5 0x00000000005a9b37 in ExecScanFetch (recheckMtd=0x5c11c0 <ForeignRecheck>,
accessMtd=0x5c11d0 <ForeignNext>, node=0x20f20a0) at execScan.c:82
#6 ExecScan (node=0x20f20a0, accessMtd=0x5c11d0 <ForeignNext>,
recheckMtd=0x5c11c0 <ForeignRecheck>) at execScan.c:132
#7 0x00000000005a2128 in ExecProcNode (node=0x20f20a0) at execProcnode.c:441
#8 0x000000000059edc2 in ExecutePlan (dest=0x210f280,
direction=<optimized out>, numberTuples=0, sendTuples=1 '\001',
operation=CMD_SELECT, planstate=0x20f20a0, estate=0x20f1f88)
at execMain.c:1449

This is the PG_CATCH block at execute_query(). fetch_result() raises
an error, then it shall be catched to release PGresult.
Although "res" should be NULL at this point, PQclear was called with
a non-zero value according to the call trace.

More strangely, I tried to inject elog(INFO, ...) to show the value of "res"
at this point. Then, it become unavailable to reproduce when I tried to
show the pointer of "res" with elog(INFO, "&res = %p", &res);

Why the "res" has a non-zero value, even though it was cleared prior
to fetch_result() and an error was raised within this function?

Thanks,

2012年2月16日13:41 Shigeru Hanada <shigeru(dot)hanada(at)gmail(dot)com>:
> Kaigai-san,
>
> Thanks for the review. Attached patches are revised version, though
> only fdw_helper_v5.patch is unchanged.
>
> (2012/02/16 0:09), Kohei KaiGai wrote:
>> [memory context of tuple store]
>> It calls tuplestore_begin_heap() under the memory context of
>> festate->scan_cxt at pgsqlBeginForeignScan.
>
> Yes, it's because tuplestore uses a context which was current when
> tuplestore_begin_heap was called. I want to use per-scan context for
> tuplestore, to keep its content tuples alive through the scan.
>
>> On the other hand, tuplestore_gettupleslot() is called under the
>> memory context of festate->tuples.
>
> Yes, result tuples to be returned to executor should be allocated in
> per-scan context and live until next IterateForeignScan (or
> EndForeignScan), because such tuple will be released via ExecClearTuple
> in next IterateForeignScan call. If we don't switch context to per-scan
> context, result tuple is allocated in per-tuple context and cause
> double-free and server crash.
>
>> I could not find a callback functions being invoked on errors,
>> so I doubt the memory objects acquired within tuplestore_begin_heap()
>> shall be leaked, even though it is my suggestion to create a sub-context
>> under the existing one.
>
> How do you confirmed that no callback function is invoked on errors? I
> think that memory objects acquired within tuplestore_begin_heap (I guess
> you mean excluding stored tuples, right?) are released during cleanup of
> aborted transaction. I tested that by adding elog(ERROR) to the tail of
> store_result() for intentional error, and execute large query 100 times
> in a session. I saw VIRT value (via top command) comes down to constant
> level after every query.
>
>> In my opinion, it is a good choice to use es_query_cxt of the supplied EState.
>> What does prevent to apply this per-query memory context?
>
> Ah, I've confused context management of pgsql_fdw... I fixed pgsql_fdw
> to create per-scan context as a child of es_query_cxt in
> BeginForeignScan, and use it for tuplestore of the scan. So, tuplestore
> and its contents are released correctly at EndForeignScan, or cleanup of
> aborted transaction in error case.
>
>> You mention about PGresult being malloc()'ed. However, it seems to me
>> fetch_result() and store_result() once copy the contents on malloc()'ed
>> area to the palloc()'ed area, and PQresult is released on an error using
>> PG_TRY() ... PG_CATCH() block.
>
> During thinking about this comment, I found double-free bug of PGresult
> in execute_query, thanks :)
>
> But, sorry, I'm not sure what the concern you show here is. The reason
> why copying tuples from malloc'ed area to palloc'ed area is to release
> PGresult before returning from the IterateForeingScan call. The reason
> why using PG_TRY block is to sure that PGresult is released before jump
> back to upstream in error case.
>
>> [Minor comments]
>> Please set NULL to "sql" variable at begin_remote_tx().
>> Compiler raises a warnning due to references of uninitialized variable,
>> even though the code path never run.
>
> Fixed. BTW, just out of curiosity, which compiler do you use? My
> compiler ,gcc (GCC) 4.6.0 20110603 (Red Hat 4.6.0-10) on Fedora 15,
> doesn't warn it.
>
>> It potentially causes a problem in case when fetch_result() raises an
>> error because of unexpected status (!= PGRES_TUPLES_OK).
>> One code path is not protected with PG_TRY(), and other code path
>> will call PQclear towards already released PQresult.
>
> Fixed.
>
>> Although it is just a preference of mine, is the exprFunction necessary?
>> It seems to me, the point of push-down check is whether the supplied
>> node is built-in object, or not. So, an sufficient check is is_builtin() onto
>> FuncExpr->funcid, OpExpr->opno, ScalarArrayOpExpr->opno and so on.
>> It does not depend on whether the function implementing these nodes
>> are built-in or not.
>
> Got rid of exprFunction and fixed foreign_expr_walker to check function
> oid in each case label.
>
> Regards,
> --
> Shigeru Hanada

--
KaiGai Kohei <kaigai(at)kaigai(dot)gr(dot)jp>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jay Levitt 2012-02-16 17:12:25 Re: Designing an extension for feature-space similarity search
Previous Message Marko Kreen 2012-02-16 15:17:25 Re: Speed dblink using alternate libpq tuple storage