Re: pg_dump or hardware?

Lists: pgsql-bugs
From: andrea suisani <andrea(dot)suisani(at)opinioni(dot)net>
To: "PostgreSQL Bugs List" <pgsql-bugs(at)postgresql(dot)org>
Subject: pg_dump or hardware?
Date: 2006-04-19 11:20:22
Message-ID: 44461CF6.4060302@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi all,

short summary:

platform: i386 SMP (dual PIII)
os: linux 2.6.8.1
vendor: debian (3.1, stable)
pgsql ver: 7.4.7 (deb)
disk: tech. SCSI vendor. IBM model. DDYS-T36950N rev. S96H
controller: adaptec aic-7892a

description:

we're experiencing a weird problem
trying to get a dump of our db for backup purposes,
the executed command is:

/usr/bin/pg_dump -U postgres -h 6pali elenco | /usr/bin/bzip2 > elenco_test.bz2

the output:

pg_dump: ERROR: could not open relation with OID 201327173
pg_dump: SQL command to dump the contents of table "nominativi" failed: PQendcopy() failed.
pg_dump: Error message from server: ERROR: could not open relation with OID 201327173
pg_dump: The command was: COPY public.nominativi (nome_cogno, indirizzo, cap, citta, prov,
prefisso, telefono1, telefono2, note, idpersona, estrazione, num_estra, occupato,
cod_prov, cod_com, cod_reg, capoluo, rand) TO stdout;

so it seems that we've got some problems with the "nominativi" table
(a 20 million-row table), in fact the following command also fails:

pg_dump -t nominativi -U postgres -h 6pali elenco | /usr/bin/bzip2 > nominativi.bz2

with the same err msg as before. Before the erros occurs we're are able to
get a partial backup, see:

#> ls -l nominativi.bz2
-rw-r--r-- 1 sickpig users 2.5M apr 19 12:35 nominativi.bz2

#> wc -l nominativi
145904 nominativi

We're trying to understand whether this is due to data corruption or
hardware failure. We run long self-tests on our SCSI disk through
smartmontools on a regular basis. see attached file for "smartctl -a /dev/sda"
output. All suggestions are welcome.

Regards,

Andrea

Attachment Content-Type Size
smartctl.txt text/plain 3.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: andrea suisani <andrea(dot)suisani(at)opinioni(dot)net>
Cc: "PostgreSQL Bugs List" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: pg_dump or hardware?
Date: 2006-04-19 17:28:48
Message-ID: 277.1145467728@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

andrea suisani <andrea(dot)suisani(at)opinioni(dot)net> writes:
> pg_dump: ERROR: could not open relation with OID 201327173
> pg_dump: SQL command to dump the contents of table "nominativi" failed: PQendcopy() failed.
> pg_dump: Error message from server: ERROR: could not open relation with OID 201327173

Hmm ... what do you get from

select oid from pg_class where relname = 'nominativi';
select relname from pg_class where oid = 201327173;

If the first returns 201327173 while the second doesn't return anything,
then I'd wonder about corruption of pg_class's OID index. REINDEXing it
might help. I don't remember whether 7.4 makes you use a standalone
backend to reindex system catalogs --- see its REINDEX man page for
details.

regards, tom lane


From: andrea suisani <andrea(dot)suisani(at)opinioni(dot)net>
To: PostgreSQL Bugs List <pgsql-bugs(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: pg_dump or hardware?
Date: 2006-04-20 07:36:12
Message-ID: 444739EC.70300@opinioni.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thanks for the quick reply

Tom Lane wrote:
> andrea suisani <andrea(dot)suisani(at)opinioni(dot)net> writes:

[cut]

> Hmm ... what do you get from
>
> select oid from pg_class where relname = 'nominativi';
oid
--------
561644
(1 row)

afaics it seems weird.... does this mean that another postgresql
object screw up? this is the "\d nominativi" output:

Table "public.nominativi"
Column | Type | Modifiers
------------+------------------+-----------
nome_cogno | character(70) |
indirizzo | character(40) |
cap | character(5) |
citta | character(35) |
prov | character(2) |
prefisso | character(4) |
telefono1 | character(13) |
telefono2 | character(13) |
note | character(100) |
idpersona | integer |
estrazione | date |
num_estra | integer |
occupato | boolean |
cod_prov | integer |
cod_com | integer |
cod_reg | integer |
capoluo | integer |
rand | double precision |
Indexes:
"citta1_idx" btree (prov, citta) WHERE (((prov = 'BL'::bpchar) OR (prov =
'PD'::bpchar) OR (prov = 'RO'::bpchar) OR (prov = 'VE'::bpchar) OR (prov = 'TV'::bpchar)
OR (prov = 'VR'::bpchar)) AND (cod_com IS NULL))
"cod_com_ndx1" btree (cod_com)
"codprov_capo_rand1" btree (cod_prov, capoluo, rand)
"pre_tel_index1" btree (prefisso, telefono1)

and none of nominativi's indexes has an oid like 201327173

select oid
from pg_class
where relname in ('citta1_idx','cod_com_ndx1','codprov_capo_rand1','pre_tel_index1');
oid
----------
46788374
40916657
40916656
40916658
(4 rows)

> select relname from pg_class where oid = 201327173;
relname
---------
(0 rows)

while if I "ask" for oid 561644 this is what I get:

select relname from pg_class where oid = 561644;
relname
------------
nominativi
(1 row)

> If the first returns 201327173 while the second doesn't return anything,
> then I'd wonder about corruption of pg_class's OID index. REINDEXing it
> might help.
> I don't remember whether 7.4 makes you use a standalone
> backend to reindex system catalogs --- see its REINDEX man page for
> details.

from what I can see from REINDEX man page, I can reindex shared system catalogs
in stand-alone mode

[cut]

Regards,

Andrea