Re: Corrupt index

From: Amir Becher <abecher(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Corrupt index
Date: 2003-04-10 20:32:47
Message-ID: 20030410203247.63208.qmail@web13902.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

The source of the problem was the VERITAS Backup Exec.
I was able to replicate the index corruption in under
a minute by running a backup and updating the database
at the same time. I have never been able to replicate
the problem before because I was testing during the
day, when the backup was not running.

As far as backups are concerned, we will no longer
backup the data directory itself - that was clearly a
dumb thing to do in the first place. We actually have
been backing up the data using pg_dumpall as well (so
there is still hope for us).

Thanks for all the help - I greatly appreciate it.

--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Amir Becher <abecher(at)yahoo(dot)com> writes:
> > I don't know if this may have something to do with
> it,
> > but we do backup the data every night using
> VERITAS
> > Backup Exec. We are not restoring anything, though
> > (the data is backed up to tape). The VERITAS
> software
> > runs on Windows, but there is an agent that runs
> on
> > our Linux box where the PostgreSQL data is stored.
> I
> > should also mention that the backup is running
> while
> > the database is being modified (we modify the
> database
> > 24/7).
>
> You're wasting your time making such a backup --- if
> you ever have to
> use it, it'll be corrupt, because the individual
> files in the database
> won't be in sync. But that's not the immediate
> problem.
>
> > There is another unexpected behavior that I
> noticed
> > for the first time this morning (so I am not sure
> if
> > it's recurring, related or relevant). The database
> > "blinked" in the sense that all database
> connections
> > were lost - but new connections could be obtained
> > immediately after the "blink". The error message
> that
> > I got said something about possible "corrupted
> shared
> > memory" and I guess the shutting down of the
> > connections was a precautionary measure.
>
> That sounds like a backend crash, all right. Given
> that, I'm thinking
> that you have more extensive problems than just this
> one symptom. The
> odds are good that it's a hardware issue, because we
> haven't heard any
> reports of comparable misbehavior from anyone else.
>
> I'd recommend running some hardware diagnostics ---
> memtest86 and
> badblocks seem to be the most widely used, although
> they aren't always
> able to find problems.
>
> It would also be a good idea to start taking some
> *real* backups, using
> pg_dump or pg_dumpall. You will be lucky if you
> don't find any more
> serious corruption in the database, if I'm right
> that there's hardware
> flakiness involved. You may find yourself forced to
> initdb and restore
> from a backup, so you'd better have one.
>
> regards, tom lane

__________________________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online, calculators, forms, and more
http://tax.yahoo.com

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2003-04-10 20:51:12 Re: Pg and Stunnel
Previous Message Roderick A. Anderson 2003-04-10 20:24:25 Re: Pg and Stunnel