Lists: | pgsql-general |
---|
From: | Jeff Amiel <becauseimjeff(at)yahoo(dot)com> |
---|---|
To: | pgsql-general(at)postgresql(dot)org |
Subject: | pg_dump causes postgres crash |
Date: | 2007-08-23 02:44:25 |
Message-ID: | 864994.9762.qm@web60819.mail.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Fairly new (less than a week) install.
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by
GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)"
database size around 43 gigabytes.
2 attempts at a pg_dump across the network caused the
database to go down...
The first time I thought it was because of mismatched
pg_dump (was version 8.0.X)...but the second time it
was definitely 8.2.4 version of pg_dump.
My first thought was corruption...but this database
has successfully seeded 2 slony subscriber nodes from
scratch as well running flawlessly under heavy load
for the past week.
Even more odd is that a LOCAL pg_dump (from on the
box) succeeded just fine tonight (after the second
crash).
Thoughts?
----First Crash-------
backup-srv2 prod_backup # time /usr/bin/pg_dump
--format=c --compress=9 --ignore-version
--username=backup --host=prod_server prod > x
pg_dump: server version: 8.2.4; pg_dump version:
8.0.13
pg_dump: proceeding despite version mismatch
pg_dump: WARNING: terminating connection because of
crash of another server process
DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit,
because another server process exited abnormally and
possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to
the database and repeat your command.
pg_dump: server closed the connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
pg_dump: SQL command to dump the contents of table
"access_logs" failed: PQendcopy() failed.
pg_dump: Error message from server: server closed the
connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
pg_dump: The command was: COPY public.access_logs (ip,
username, "action", date, params) TO stdout;
------Second Crash--------
backup-srv2 ~ # time /usr/bin/pg_dump --format=c
--compress=9 --username=backup --host=prod_server
prod | wc -l
pg_dump: Dumping the contents of table "audit" failed:
PQgetCopyData() failed.
pg_dump: Error message from server: server closed the
connection unexpectedly
This probably means the server terminated
abnormally
before or while processing the request.
pg_dump: The command was: COPY public.audit (audit_id,
entity_id, table_name, serial_id, audit_action,
when_ts, user_id, user_ip) TO stdout;
____________________________________________________________________________________
Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games.
http://sims.yahoo.com/
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeff Amiel <becauseimjeff(at)yahoo(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: pg_dump causes postgres crash |
Date: | 2007-08-23 03:55:34 |
Message-ID: | 16452.1187841334@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> Even more odd is that a LOCAL pg_dump (from on the
> box) succeeded just fine tonight (after the second
> crash).
That seems to eliminate the theory of a crash due to data corruption
... unless the corruption somehow repaired itself in the intervening
30 minutes, which hardly seems likely.
> ----First Crash-------
> backup-srv2 prod_backup # time /usr/bin/pg_dump
> --format=c --compress=9 --ignore-version
> --username=backup --host=prod_server prod > x
> pg_dump: server version: 8.2.4; pg_dump version:
> 8.0.13
> pg_dump: proceeding despite version mismatch
> pg_dump: WARNING: terminating connection because of
> crash of another server process
> DETAIL: The postmaster has commanded this server
> process to roll back the current transaction and exit,
> because another server process exited abnormally and
> possibly corrupted shared memory.
Notice that pg_dump is showing that the crash was in some OTHER server
process, not the one it was attached to.
> ------Second Crash--------
> backup-srv2 ~ # time /usr/bin/pg_dump --format=c
> --compress=9 --username=backup --host=prod_server
> prod | wc -l
> pg_dump: Dumping the contents of table "audit" failed:
> PQgetCopyData() failed.
> pg_dump: Error message from server: server closed the
> connection unexpectedly
> This probably means the server terminated
> abnormally
> before or while processing the request.
> pg_dump: The command was: COPY public.audit (audit_id,
This one looks more like it might have been the directly connected
server process that crashed. However, your postmaster log from
the other message:
> From the logs tonight when the second crash occurred..
> Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
> local0.info] [6-1] 2007-08-22 20:45:12 CDT LOG:
> received smart shutdown request
> Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
> local0.info] [7-1] 2007-08-22 20:45:12 CDT LOG:
> server process (PID 20188) was terminated by signal 11
> Aug 22 20:45:12 db-1 postgres[5805]: [ID 748848
> local0.info] [8-1] 2007-08-22 20:45:12 CDT LOG:
> terminating any other active server processes
raises still more questions: where the heck did the "smart shutdown
request" (that is to say, a SIGTERM interrupt to the postmaster) come
from? It's far too much of a coincidence for that to have occurred
within a second of detecting the server process crash.
> We have introduced some new network architecture which
> is acting odd lately (dell managed switches, netscreen
> ssgs, etc) and the database itself resides on a zfs
> partition on a Pillar SAN (connected via fibre
> channel)
I can't help thinking you are looking at generalized system
instability. Maybe someone knocked a few cables loose while
installing new network hardware?
regards, tom lane
From: | Jeff Amiel <becauseimjeff(at)yahoo(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: pg_dump causes postgres crash |
Date: | 2007-08-23 11:08:26 |
Message-ID: | 777010.97329.qm@web60823.mail.yahoo.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> I can't help thinking you are looking at generalized
> system
> instability. Maybe someone knocked a few cables
> loose while
> installing new network hardware?
Database server/storage instability or network
instability?
There is no doubt that there is something flaky about
the networking between the db server and the box(es)
trying to do the pg_dump. We have indeed had issues
(timeouts, halts, etc) moving large quantities of data
across various segments to and from these boxes...like
the db server....but how would this effect something
like a pg_dump?
Would a good stack trace (assuming I want to crash my
database again) help here?
____________________________________________________________________________________Ready for the edge of your seat?
Check out tonight's top picks on Yahoo! TV.
http://tv.yahoo.com/
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jeff Amiel <becauseimjeff(at)yahoo(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: pg_dump causes postgres crash |
Date: | 2007-08-23 14:30:57 |
Message-ID: | 1353.1187879457@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-general |
Jeff Amiel <becauseimjeff(at)yahoo(dot)com> writes:
> Would a good stack trace (assuming I want to crash my
> database again) help here?
Well, it'd be more information than we have now ...
regards, tom lane