Re: PG periodic Error on W2K

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Paul Lambert <paul(dot)lambert(at)autoledgers(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PG periodic Error on W2K
Date: 2007-03-01 08:56:52
Message-ID: 20070301085652.GA27639@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Thu, Mar 01, 2007 at 09:44:19AM +0900, Paul Lambert wrote:
> I'm running PG 8.2.3 on We doze 2000 Server. (Should I apologise for
> that up front to appease the masses?)
>
> I am periodically getting errors pop up on the server console of the
> following nature:
>
> The File or directory D:\PostgresQL\Data\global\pgstat.stat is corrupt
> and unreadable. Please run the Chkdsk utility.
>
> and
>
> The file or directory D: is corrupt and unreadable. Please run the
> Chkdsk utility.
>
> Now, per the errors suggestion I have run the chkdsk utility with a /X
> /F switch to do a complete check on reboot before mounting the volume.
>
> This showed no errors.
>
> I can also open the mentioned file - pgstat.stat - using notepad or any
> other program without mention of corruption and the data within the file
> looks to be uniform suggesting it is fine.
>
> Strangely enough, this error was being presented on the last server I
> had it running on, and was in fact one of the reasons I moved it - I
> assumed the error was due to dodgy disks but this seems a bit much of a
> coincidence.
>
> I know these errors are not coming directly from Postgres, but does
> anyone else have problems (or has had previously) of a similar nature or
> any suggestions on where it may be?

They are, as you say, generated by Windows, and not PostgreSQL. They're
a clear indication of either hardware problem, driver problem or windows
bug (which we all know don't exist, so it must be one of the first).

They can *not* be caused by a bug in PostgreSQL - no more than a kernel
oops in linux is the fault of PostgreSQL. Now, we do push the filesystem
and disk layer in an unusual way with the pgstats writes, gievn that we
rewrite the same file over and over and over and over again at very
short intervals. But nothing says we're not allowed to do that :-)

The reason you acn open it with notepad is most likely that it's a
different file - the file is deleted and recreated at a rate of at least
twice per second, when there is activity happening in the database. The
error is more a "filesystem is broken" message than "this file is
broken".

> As a side-note, this server is RAID controlled, the D drive has
>
> 3 disks in the array - I would therefore have assumed that if there
> was a problem with one of the disks then the server would carry on using
> the other disks.

You would hope so. But the problem could be in the actual RAID
controller. There are a lot of el-cheapo RAID-boards out there that
really do more harm than good. Then there are of course a lot of very
nice controllers as well :-) Which one do you have?

Also, it could very well be a driver problem - have you verified that
you're on the latest version?

> I can find no performance degradation in Postgres, the service and
> connections et al. keep on operating as though there was nothing wrong,
> but the errors continue to pop up sporadically on the console.
>
> Thoughts? Ideas? Suggestions? Should I bugger off?

Thoughts: scary (for you).
Ideas: see above.
Suggestions: get it fixed. Next time it might be your datafile or WAL.
Bugger off: Nah, find out what it was and let us know instead :-)

On Wed, Feb 28, 2007 at 04:52:15PM -0800, Joshua D. Drake wrote:
> Try turning off stats. However you will need to run vacuum using some
> other method.

While this will get rid of the message (most likely) you're only curing
the symptoms and not the problem. As I said above, next time it might be
a file that contains important data.

//Magnus

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message hubert depesz lubaczewski 2007-03-01 09:02:35 usage for 'with recursive'?
Previous Message John Gant 2007-03-01 05:53:58 2d array issues