Re: after using pg_resetxlog, db lost

Lists: pgsql-performance
From: "Shea,Dan [CIS]" <Dan(dot)Shea(at)ec(dot)gc(dot)ca>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: after using pg_resetxlog, db lost
Date: 2004-06-24 16:14:54
Message-ID: 644D07D3D59D8F408CD01AC2F833D8C62B9217@cisxa.cmc.int.ec.gc.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I determined the largest was 12,293,162 and set it to
pg_resetxlog -x 15000000 /var/lib/pgsql/data

I am now able to see all the data.

I actually checked the log for the previous successfull startup before it the pg_control file was reset and it reported
Jun 22 11:55:44 pascal postgres[24993]: [5-1] LOG: next transaction ID: 14820367; next OID: 727013114

So I entered
pg_resetxlog -o 750000000 /var/lib/pgsql/data Setting oid value

I couldn't set 10000/0, so tried below
pg_resetxlog -l 10000,0 /var/lib/pgsql/data

This seems to be wrong because the databse is complaining and shutting down
Jun 24 15:02:05 murphy postgres[28061]: [6-1] LOG: checkpoint record is at 2710/1000050
Jun 24 15:02:05 murphy postgres[28061]: [7-1] LOG: redo record is at 2710/1000050; undo record is at 0/0; shutdown TRUE
Jun 24 15:02:05 murphy postgres[28061]: [8-1] LOG: next transaction ID: 15000010; next OID: 750000000
Jun 24 15:02:05 murphy postgres[28061]: [9-1] LOG: database system was not properly shut down; automatic recovery in progress
Jun 24 15:02:05 murphy postgres[28062]: [5-1] FATAL: the database system is starting up
Jun 24 15:02:05 murphy postgres[28063]: [5-1] FATAL: the database system is starting up
Jun 24 15:02:05 murphy postgres[28061]: [10-1] LOG: redo starts at 2710/1000090
Jun 24 15:02:05 murphy postgres[28061]: [11-1] PANIC: could not access status of transaction 15000030
Jun 24 15:02:05 murphy postgres[28061]: [11-2] DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/000E" at offset 73728: Success
Jun 24 15:02:05 murphy postgres[24771]: [5-1] LOG: startup process (PID 28061) was terminated by signal 6
Jun 24 15:02:05 murphy postgres[24771]: [6-1] LOG: aborting startup due to startup process failure
Jun 24 15:50:51 murphy sshd(pam_unix)[690]: session opened for user root by (uid=0)
Jun 24 15:54:47 murphy su(pam_unix)[1541]: session opened for user postgres by root(uid=0)
Jun 24 16:03:47 murphy su(pam_unix)[2911]: session opened for user postgres by root(uid=0)
Jun 24 16:03:48 murphy su(pam_unix)[2911]: session closed for user postgres
Jun 24 16:03:48 murphy postgres[3182]: [1-1] LOG: could not create IPv6 socket: Address family not supported by protocol
Jun 24 16:03:48 murphy postgres[3188]: [2-1] LOG: database system was interrupted while in recovery at 2004-06-24 15:02:05 GMT
Jun 24 16:03:48 murphy postgres[3188]: [2-2] HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
Jun 24 16:03:48 murphy postgres[3188]: [3-1] LOG: checkpoint record is at 2710/1000050
Jun 24 16:03:48 murphy postgres[3188]: [4-1] LOG: redo record is at 2710/1000050; undo record is at 0/0; shutdown TRUE
Jun 24 16:03:48 murphy postgres[3188]: [5-1] LOG: next transaction ID: 15000010; next OID: 750000000
Jun 24 16:03:48 murphy postgres[3188]: [6-1] LOG: database system was not properly shut down; automatic recovery in progress
Jun 24 16:03:48 murphy postgres[3188]: [7-1] LOG: redo starts at 2710/1000090
Jun 24 16:03:48 murphy postgres[3188]: [8-1] PANIC: could not access status of transaction 15000030
Jun 24 16:03:48 murphy postgres[3188]: [8-2] DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/000E" at offset 73728: Success
Jun 24 16:03:48 murphy postgres[3182]: [2-1] LOG: startup process (PID 3188) was terminated by signal 6
Jun 24 16:03:48 murphy postgres[3182]: [3-1] LOG: aborting startup due to startup process failure

How do I set the xlog properly, or rather to 10000/0?
Dan.
-----Original Message-----
From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Wednesday, June 23, 2004 11:41 PM
To: Shea,Dan [CIS]
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] after using pg_resetxlog, db lost

"Shea,Dan [CIS]" <Dan(dot)Shea(at)ec(dot)gc(dot)ca> writes:
> Tom I see you from past emails that you reference using -i -f with
> pg_filedump. I have tried this, but do not know what I am looking at.

What you want to look at is valid XMIN and XMAX values. In this
example:

> Item 1 -- Length: 196 Offset: 4292 (0x10c4) Flags: USED
> XID: min (2) CMIN|XMAX: 211 CMAX|XVAC: 469
> Block Id: 0 linp Index: 1 Attributes: 24 Size: 28
> infomask: 0x2912 (HASVARWIDTH|HASOID|XMIN_COMMITTED|XMAX_INVALID|UPDATED)

the infomask shows XMIN_COMMITTED, so xmin (here 2) is valid, but it also
shows XMAX_INVALID, so the putative XMAX (211) should be ignored.

In general the xmin field should be valid, but xmax shares storage with
cmin and so you have to look at the infomask bits to know whether to
believe that the cmin/xmax field represents a transaction ID.

The cmax/xvac field could also hold a transaction ID. If I had only
the above data to go on, I'd guess that the current transaction counter
is at least 469.

Under normal circumstances, command counter values (cmin or cmax) are
unlikely to exceed a few hundred, while the transaction IDs you are
looking for are likely to be much larger. So you could get away with
just computing the max of *all* the numbers you see in xmin, cmin/xmax,
or cmax/cvac, and then using something a million or so bigger for safety
factor.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Shea,Dan [CIS]" <Dan(dot)Shea(at)ec(dot)gc(dot)ca>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: after using pg_resetxlog, db lost
Date: 2004-06-24 16:33:41
Message-ID: 15230.1088094821@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Shea,Dan [CIS]" <Dan(dot)Shea(at)ec(dot)gc(dot)ca> writes:
> I determined the largest was 12,293,162 and set it to
> pg_resetxlog -x 15000000 /var/lib/pgsql/data

Okay, but it looks like you will also need to adjust pg_clog to cover
that transaction ID range. (I had thought pg_resetxlog would handle
this for you, but it looks like not.)

> Jun 24 15:02:05 murphy postgres[28061]: [11-1] PANIC: could not access status of transaction 15000030
> Jun 24 15:02:05 murphy postgres[28061]: [11-2] DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/000E" at offset 73728: Success

You need to append zeroes (8K at a time) to
/var/lib/pgsql/data/pg_clog/000E until it's longer than 73728 bytes.
I'd use something like
dd bs=8k count=1 </dev/zero >>/var/lib/pgsql/data/pg_clog/000E
assuming that your system has /dev/zero.

regards, tom lane