array_agg crash?

Lists: pgsql-general
From: "Chris Spotts" <rfusca(at)gmail(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: array_agg crash?
Date: 2009-07-21 13:12:19
Message-ID: 00c301ca0a04$e1eccd20$a5c66760$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I had one simple query that kept crashing the connection. It crashes after
several minutes.

Tried restarting, it still error'd at the same place.

Tried recreating the table it was selecting from, it still error'd at the
same place.

I rewrote the query with an ARRAY subselect and it finished flawlessly in a
few seconds.

There is about 4 million records in the table its selecting from. No array
ends up with more than 4 elements.

For some reason, the log indicates this is causing an issue with
autovacuum.it says it was -9'd, but it wasn't by or any other physical
person.

Here is the problem query.

create table public.temptrips

as

select trip_id,array_agg(customer_upload_id)

from

trip_ids_to_customer_upload_ids

group by trip_id;

server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

The connection to the server was lost. Attempting reset: Failed.

!>

LOG: 00000: autovacuum launcher process (PID 10264) was terminated by
signal 9: Killed

2009-07-21 08:44:26 EDT - LOCATION: LogChildExit, postmaster.c:2673

2009-07-21 08:44:26 EDT - LOG: 00000: terminating any other active
server processes

2009-07-21 08:44:26 EDT - LOCATION: HandleChildCrash, postmaster.c:2500

2009-07-21 08:44:26 EDT - WARNING: 57P02: terminating connection because
of crash of another server process

2009-07-21 08:44:26 EDT - DETAIL: The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

2009-07-21 08:44:26 EDT - HINT: In a moment you should be able to
reconnect to the database and repeat your command.

2009-07-21 08:44:26 EDT - LOCATION: quickdie, postgres.c:2495

2009-07-21 08:44:27 EDT - postgres postgres /usr/local/pgsql/bin/postmaster
FATAL: 57P03: the database system is in recovery mode

2009-07-21 08:44:27 EDT - postgres postgres /usr/local/pgsql/bin/postmaster
LOCATION: ProcessStartupPacket, postmaster.c:1721

2009-07-21 08:44:26 EDT - postgres postgres startup WARNING: 57P02:
terminating connection because of crash of another server process

2009-07-21 08:44:27 EDT - postgres postgres startup DETAIL: The postmaster
has commanded this server process to roll back the current transaction and
exit, because another server process exited abnormally and possibly
corrupted shared memory.

2009-07-21 08:44:27 EDT - postgres postgres startup HINT: In a moment you
should be able to reconnect to the database and repeat your command.

2009-07-21 08:44:27 EDT - postgres postgres startup LOCATION: quickdie,
postgres.c:2495

2009-07-21 08:44:28 EDT - LOG: 00000: all server processes terminated;
reinitializing

2009-07-21 08:44:28 EDT - LOCATION: PostmasterStateMachine,
postmaster.c:2858

2009-07-21 08:44:28 EDT - LOG: 00000: database system was interrupted;
last known up at 2009-07-21 08:41:32 EDT

2009-07-21 08:44:28 EDT - LOCATION: StartupXLOG, xlog.c:5236

2009-07-21 08:44:28 EDT - LOG: 00000: database system was not properly
shut down; automatic recovery in progress

2009-07-21 08:44:28 EDT - LOCATION: StartupXLOG, xlog.c:5410

2009-07-21 08:44:28 EDT - LOG: 00000: redo starts at 76/4380AC70

2009-07-21 08:44:28 EDT - LOCATION: StartupXLOG, xlog.c:5493

2009-07-21 08:44:29 EDT - LOG: 00000: record with zero length at
76/438869D0

2009-07-21 08:44:29 EDT - LOCATION: ReadRecord, xlog.c:3532

2009-07-21 08:44:29 EDT - LOG: 00000: redo done at 76/438869A0

2009-07-21 08:44:29 EDT - LOCATION: StartupXLOG, xlog.c:5625

2009-07-21 08:44:29 EDT - LOG: 00000: last completed transaction was at
log time 2009-07-21 08:41:49.707423-04

2009-07-21 08:44:29 EDT - LOCATION: StartupXLOG, xlog.c:5629

2009-07-21 08:44:30 EDT - LOG: 00000: autovacuum launcher started

2009-07-21 08:44:30 EDT - LOCATION: AutoVacLauncherMain,
autovacuum.c:529

2009-07-21 08:44:30 EDT - LOG: 00000: database system is ready to accept
connections

2009-07-21 08:44:30 EDT - LOCATION: reaper, postmaster.c:2272

It looks like this is causing the autovacuum to crash, what could cause
this?

postgres=# select version();

version

----------------------------------------------------------------------------
------------------------------------------

PostgreSQL 8.4.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC)
3.4.6 20060404 (Red Hat 3.4.6-8.0.1), 64-bit

(1 row)

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Chris Spotts" <rfusca(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: array_agg crash?
Date: 2009-07-21 13:55:05
Message-ID: 11654.1248184505@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Chris Spotts" <rfusca(at)gmail(dot)com> writes:
> LOG: 00000: autovacuum launcher process (PID 10264) was terminated by
> signal 9: Killed

Looks like the OOM killer is loose on your system. Disable memory
overcommit in the kernel and things will get better.
http://www.postgresql.org/docs/8.4/static/kernel-resources.html#AEN24152

regards, tom lane


From: "Chris Spotts" <rfusca(at)gmail(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: array_agg crash?
Date: 2009-07-21 14:26:25
Message-ID: 00cb01ca0a0f$3bd06220$b3712660$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> "Chris Spotts" <rfusca(at)gmail(dot)com> writes:
> > LOG: 00000: autovacuum launcher process (PID 10264) was terminated
> by
> > signal 9: Killed
>
> Looks like the OOM killer is loose on your system. Disable memory
> overcommit in the kernel and things will get better.
> http://www.postgresql.org/docs/8.4/static/kernel-
> resources.html#AEN24152
>
> regards, tom lane
[Spotts, Christopher]

That seems to have fixed it in the sense that it doesn't crash, but I killed
it after 15 minutes and no results...rather I tried to kill it, but it looks
like I'm going to have to -9 it...


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Chris Spotts" <rfusca(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: array_agg crash?
Date: 2009-07-21 15:42:07
Message-ID: 13990.1248190927@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Chris Spotts" <rfusca(at)gmail(dot)com> writes:
> That seems to have fixed it in the sense that it doesn't crash, but I killed
> it after 15 minutes and no results...rather I tried to kill it, but it looks
> like I'm going to have to -9 it...

How big were the arrays you were trying to push around here? I tried
interrupting a similar query and it came right back; but if you were
building some really enormous arrays I could see the final array
build loop taking awhile, and there's no CHECK_FOR_INTERRUPTS in it...

regards, tom lane


From: "Chris Spotts" <rfusca(at)gmail(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: array_agg crash?
Date: 2009-07-21 16:57:10
Message-ID: 00e401ca0a24$4b130ac0$e1392040$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> killed
> > it after 15 minutes and no results...rather I tried to kill it, but
> it looks
> > like I'm going to have to -9 it...
>
> How big were the arrays you were trying to push around here? I tried
> interrupting a similar query and it came right back; but if you were
> building some really enormous arrays I could see the final array
> build loop taking awhile, and there's no CHECK_FOR_INTERRUPTS in it...
>
> regards, tom lane
[Spotts, Christopher]
Like I'd said originally, there were no arrays that ended up being more 4
elements long - all integers. The vast majority of them were 1 or 2 long.

If it was having trouble allocating the memory for arrays, why doesn't using
a ARRAY(subselect) yield the same issue?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Chris Spotts" <rfusca(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: array_agg crash?
Date: 2009-07-21 17:02:57
Message-ID: 15291.1248195777@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Chris Spotts" <rfusca(at)gmail(dot)com> writes:
>>> killed
>>> it after 15 minutes and no results...rather I tried to kill it, but
>>> it looks like I'm going to have to -9 it...
>>
>> How big were the arrays you were trying to push around here?

> Like I'd said originally, there were no arrays that ended up being more 4
> elements long - all integers. The vast majority of them were 1 or 2 long.

Hm, maybe the problem is not so much array size as too many arrays. How
many groups are you expecting in that query? Does the plan for the
array_agg query show hash or group aggregation?

regards, tom lane


From: "Chris Spotts" <rfusca(at)gmail(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: array_agg crash?
Date: 2009-07-21 17:09:13
Message-ID: 00ef01ca0a25$fa4f7720$eeee6560$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> >> How big were the arrays you were trying to push around here?
>
> > Like I'd said originally, there were no arrays that ended up being
> more 4
> > elements long - all integers. The vast majority of them were 1 or 2
> long.
>
> Hm, maybe the problem is not so much array size as too many arrays.
> How
> many groups are you expecting in that query? Does the plan for the
> array_agg query show hash or group aggregation?
>
> regards, tom lane
[Spotts, Christopher]
GroupAggregate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Chris Spotts" <rfusca(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: array_agg crash?
Date: 2009-07-21 17:16:05
Message-ID: 15511.1248196565@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Chris Spotts" <rfusca(at)gmail(dot)com> writes:
>> many groups are you expecting in that query? Does the plan for the
>> array_agg query show hash or group aggregation?

> GroupAggregate

Huh, there's no reason it should take much memory then. Maybe you've
found a memory leak. Can you put together a self-contained test case?

regards, tom lane


From: "Chris Spotts" <rfusca(at)gmail(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: array_agg crash?
Date: 2009-07-21 17:28:54
Message-ID: 00f001ca0a28$b98aee60$2ca0cb20$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: Tuesday, July 21, 2009 12:16 PM
> To: Chris Spotts
> Cc: pgsql-general(at)postgresql(dot)org
> Subject: Re: [GENERAL] array_agg crash?
>
> "Chris Spotts" <rfusca(at)gmail(dot)com> writes:
> >> many groups are you expecting in that query? Does the plan for the
> >> array_agg query show hash or group aggregation?
>
> > GroupAggregate
>
> Huh, there's no reason it should take much memory then. Maybe you've
> found a memory leak. Can you put together a self-contained test case?
>
> regards, tom lane
[Spotts, Christopher]
I'll try to put something together tonight and let you know tomorrow.


From: Chris Spotts <rfusca(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: array_agg crash?
Date: 2009-07-22 06:18:26
Message-ID: 4A66AF32.2090506@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane wrote:
> "Chris Spotts" <rfusca(at)gmail(dot)com> writes:
>
>>> many groups are you expecting in that query? Does the plan for the
>>> array_agg query show hash or group aggregation?
>>>
>
>
>> GroupAggregate
>>
>
> Huh, there's no reason it should take much memory then. Maybe you've
> found a memory leak. Can you put together a self-contained test case?
>
> regards, tom lane
>
What do you want specifically as far as details for the test case? I
exported just the table that that was reading from. Installed a new
clean virtual machine ubuntu (jaunty) and then installed 8.4.0.
Imported the table and definition. Ran the same query and the same
thing happened.

Table its selecting from is:
Table "public.trip_ids_to_customer_upload_ids"
Column | Type | Modifiers | Storage | Description
--------------------+---------+-----------+---------+-------------
trip_id | bigint | | plain |
customer_upload_id | integer | | plain |
Indexes:
"trips_customer_id" btree (trip_id, customer_upload_id)
Has OIDs: no

There is 3801347 rows in the table. There are 3773039 distinct trip_id
values. So you can see that the vast majority of rows here are just a
single element array.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Spotts <rfusca(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: array_agg crash?
Date: 2009-07-22 12:50:17
Message-ID: 18106.1248267017@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Chris Spotts <rfusca(at)gmail(dot)com> writes:
> What do you want specifically as far as details for the test case? I
> exported just the table that that was reading from. Installed a new
> clean virtual machine ubuntu (jaunty) and then installed 8.4.0.
> Imported the table and definition. Ran the same query and the same
> thing happened.

The table dump and the query would be enough then. Can you send it
to me off-list?

regards, tom lane