Re: [GENERAL] contrib module intagg crashing the backend

Lists: pgsql-bugspgsql-general
From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: contrib module intagg crashing the backend
Date: 2005-03-23 01:21:32
Message-ID: Pine.LNX.4.58.0503221717390.13920@greenie.cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

If one of the contrib modules (int_array_aggregate in contrib/intagg) is
crashing my backend, where's a good place to report a bug or get hints
where to look further?

It seems to work fine on small arrays, but crashes on large ones.
The second query would have put about 500 rows in the aggregate.

Thanks,
Ron

% psql fli fli
Welcome to psql 8.0.1, the PostgreSQL interactive terminal.
[...]

fli=# select str_name,int_array_aggregate(tlid) from tlid_streetnames
where str_name='civic center' and tigerfile=6001 group by str_name;
str_name | int_array_aggregate
--------------+-------------------------------------------
civic center | {125030363,125030026,125030039,125050707}
(1 row)

fli=# select str_name,int_array_aggregate(tlid) from tlid_streetnames
where str_name='main' group by str_name;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [GENERAL] contrib module intagg crashing the backend
Date: 2005-03-23 03:55:07
Message-ID: 20050323035507.GA24549@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

On Tue, Mar 22, 2005 at 05:21:32PM -0800, Ron Mayer wrote:
>
> If one of the contrib modules (int_array_aggregate in contrib/intagg) is
> crashing my backend, where's a good place to report a bug or get hints
> where to look further?

Bug reports should go to the pgsql-bugs mailing list (I've cc'd it
on this message). Searching the list archives for similar problems
can be helpful.

If the backend crashed then there might be a core dump somewhere
under the $PGDATA directory. Using a debugger like gdb to get a
stack trace from the core dump can help pinpoint the problem.

> It seems to work fine on small arrays, but crashes on large ones.
> The second query would have put about 500 rows in the aggregate.

I can duplicate the crash in 8.0.1 (REL8_0_STABLE) with the following
(512 works, 513 crashes):

CREATE TABLE foo (id integer, name text);
INSERT INTO foo (id, name)
SELECT x, 'test' FROM generate_series(1, 513) AS g(x);
SELECT name, int_array_aggregate(id) FROM foo GROUP BY name;

Here's a partial stack trace:

#0 0x81e003b in pfree (pointer=0x839f130) at mcxt.c:583
#1 0x810d044 in advance_transition_function (aggstate=0x839d188, peraggstate=0x839d908,
pergroupstate=0x83a1aa0, newVal=513, isNull=0 '\000') at nodeAgg.c:395
#2 0x810d0f4 in advance_aggregates (aggstate=0x839d188, pergroup=0x83a1aa0) at nodeAgg.c:439
#3 0x810d782 in agg_fill_hash_table (aggstate=0x839d188) at nodeAgg.c:935
#4 0x810d448 in ExecAgg (node=0x839d188) at nodeAgg.c:674

Line 583 in mcxt.c is:

(*header->context->methods->free_p) (header->context, pointer);

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [GENERAL] contrib module intagg crashing the backend
Date: 2005-03-23 05:10:20
Message-ID: 2113.1111554620@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

Michael Fuhr <mike(at)fuhr(dot)org> writes:
> I can duplicate the crash in 8.0.1 (REL8_0_STABLE) with the following

Grumble ... I seem to have managed to promote intagg from
broken-on-64bit-platforms to broken-on-every-platform ...
will look into a fix tomorrow.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-general(at)postgresql(dot)org, pgsql-bugs(at)postgresql(dot)org
Subject: Re: [GENERAL] contrib module intagg crashing the backend
Date: 2005-03-23 18:36:33
Message-ID: 8775.1111602993@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

I wrote:
> Grumble ... I seem to have managed to promote intagg from
> broken-on-64bit-platforms to broken-on-every-platform ...
> will look into a fix tomorrow.

Ron's problem is essentially a double-free bug. In this patch:

2005-01-27 16:35 tgl

* contrib/intagg/: int_aggregate.c, int_aggregate.sql.in
(REL7_3_STABLE), int_aggregate.c, int_aggregate.sql.in
(REL7_4_STABLE), int_aggregate.c, int_aggregate.sql.in
(REL8_0_STABLE), int_aggregate.c, int_aggregate.sql.in: Fix
security and 64-bit issues in contrib/intagg. This code could
stand to be rewritten altogether, but for now just stick a finger
in the dike.

I modified intagg to declare its transition data type as int4[] (which
is what it really is) rather than int4. Unfortunately that means that
nodeAgg.c is now aware that the transition value is pass-by-reference,
and so it thinks it needs to manage the memory used for it; which
intagg.c is also trying to do; so they both free the same bit of memory.

There is already a "proper" fix for this problem in CVS tip, but it's
too invasive to consider back-patching; not least because nodeAgg's
memory management strategy has changed since 7.3 and the fix would
probably not work that far back.

What I'm thinking I have to do is revert intagg in the back branches to
lie about its transition data type, but still have it pull the pointer
out of the passed Datum with DatumGetPointer (as opposed to the old,
definitely 64-bit-broken method of DatumGetInt32 and then cast to pointer).
This should work because nodeAgg doesn't inquire into the actual
contents of any Datum it doesn't think is pass-by-reference; so it will
never discard the upper bits of the pointer.

Ugh. Glad we have a cleaner solution to go forward with.

regards, tom lane