pg_dump causes postgres crash

Lists: pgsql-general
From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: pg_dump causes postgres crash
Date: 2007-08-23 02:44:25
Message-ID: 864994.9762.qm@web60819.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Fairly new (less than a week) install.
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by
GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)"

database size around 43 gigabytes.

2 attempts at a pg_dump across the network caused the
database to go down...

The first time I thought it was because of mismatched
pg_dump (was version 8.0.X)...but the second time it
was definitely 8.2.4 version of pg_dump.

My first thought was corruption...but this database
has successfully seeded 2 slony subscriber nodes from
scratch as well running flawlessly under heavy load
for the past week.

Even more odd is that a LOCAL pg_dump (from on the
box) succeeded just fine tonight (after the second
crash).

Thoughts?

----First Crash-------

backup-srv2 prod_backup # time /usr/bin/pg_dump
--format=c --compress=9 --ignore-version
--username=backup --host=prod_server prod > x

pg_dump: server version: 8.2.4; pg_dump version:
8.0.13
pg_dump: proceeding despite version mismatch
pg_dump: WARNING: terminating connection because of
crash of another server process
DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit,
because another server process exited abnormally and
possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to
the database and repeat your command.
pg_dump: server closed the connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
pg_dump: SQL command to dump the contents of table
"access_logs" failed: PQendcopy() failed.
pg_dump: Error message from server: server closed the
connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
pg_dump: The command was: COPY public.access_logs (ip,
username, "action", date, params) TO stdout;

------Second Crash--------

backup-srv2 ~ # time /usr/bin/pg_dump --format=c
--compress=9 --username=backup --host=prod_server
prod | wc -l
pg_dump: Dumping the contents of table "audit" failed:
PQgetCopyData() failed.
pg_dump: Error message from server: server closed the
connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
pg_dump: The command was: COPY public.audit (audit_id,
entity_id, table_name, serial_id, audit_action,
when_ts, user_id, user_ip) TO stdout;


____________________________________________________________________________________
Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games.
http://sims.yahoo.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_dump causes postgres crash
Date: 2007-08-23 03:55:34
Message-ID: 16452.1187841334@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> Even more odd is that a LOCAL pg_dump (from on the
> box) succeeded just fine tonight (after the second
> crash).

That seems to eliminate the theory of a crash due to data corruption
... unless the corruption somehow repaired itself in the intervening
30 minutes, which hardly seems likely.

> ----First Crash-------

> backup-srv2 prod_backup # time /usr/bin/pg_dump
> --format=c --compress=9 --ignore-version
> --username=backup --host=prod_server prod > x
> pg_dump: server version: 8.2.4; pg_dump version:
> 8.0.13
> pg_dump: proceeding despite version mismatch
> pg_dump: WARNING: terminating connection because of
> crash of another server process
> DETAIL: The postmaster has commanded this server
> process to roll back the current transaction and exit,
> because another server process exited abnormally and
> possibly corrupted shared memory.

Notice that pg_dump is showing that the crash was in some OTHER server
process, not the one it was attached to.

> ------Second Crash--------

> backup-srv2 ~ # time /usr/bin/pg_dump --format=c
> --compress=9 --username=backup --host=prod_server
> prod | wc -l
> pg_dump: Dumping the contents of table "audit" failed:
> PQgetCopyData() failed.
> pg_dump: Error message from server: server closed the
> connection unexpectedly
> This probably means the server terminated
> abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.audit (audit_id,

This one looks more like it might have been the directly connected
server process that crashed. However, your postmaster log from
the other message:

> From the logs tonight when the second crash occurred..
> Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
> local0.info] [6-1] 2007-08-22 20:45:12 CDT LOG:
> received smart shutdown request
> Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
> local0.info] [7-1] 2007-08-22 20:45:12 CDT LOG:
> server process (PID 20188) was terminated by signal 11
> Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
> local0.info] [8-1] 2007-08-22 20:45:12 CDT LOG:
> terminating any other active server processes

raises still more questions: where the heck did the "smart shutdown
request" (that is to say, a SIGTERM interrupt to the postmaster) come
from? It's far too much of a coincidence for that to have occurred
within a second of detecting the server process crash.

> We have introduced some new network architecture which
> is acting odd lately (dell managed switches, netscreen
> ssgs, etc) and the database itself resides on a zfs
> partition on a Pillar SAN (connected via fibre
> channel)

I can't help thinking you are looking at generalized system
instability. Maybe someone knocked a few cables loose while
installing new network hardware?

regards, tom lane


From: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_dump causes postgres crash
Date: 2007-08-23 11:08:26
Message-ID: 777010.97329.qm@web60823.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I can't help thinking you are looking at generalized
> system
> instability. Maybe someone knocked a few cables
> loose while
> installing new network hardware?

Database server/storage instability or network
instability?

There is no doubt that there is something flaky about
the networking between the db server and the box(es)
trying to do the pg_dump. We have indeed had issues
(timeouts, halts, etc) moving large quantities of data
across various segments to and from these boxes...like
the db server....but how would this effect something
like a pg_dump?

Would a good stack trace (assuming I want to crash my
database again) help here?


____________________________________________________________________________________Ready for the edge of your seat?
Check out tonight's top picks on Yahoo! TV.
http://tv.yahoo.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pg_dump causes postgres crash
Date: 2007-08-23 14:30:57
Message-ID: 1353.1187879457@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> Would a good stack trace (assuming I want to crash my
> database again) help here?

Well, it'd be more information than we have now ...

regards, tom lane