Quick Links

Re: could not read block 77 of relation 1663/16385/388818775

Lists:	pgsql-bugs

From:	Alexandra Nitzschke <an(at)clickware(dot)de>
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-20 10:22:22
Message-ID:	49253A5E.6000602@clickware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi,

we encountered the following error while inserting a record into a table:

org.postgresql.util.PSQLException: ERROR: could not read block 77 of relation 1663/16385/388818775: read only 0 of 8192
bytes

Using postgres 8.3.5

The reported object is an index.
The size of its data file is 630784 bytes.

The output of pg_filedump of block 76 is ok.
The output of pg_filedump of block 77 is the following:

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 8.3.0
*
* File: /tmp/388818775
* Options used: -i -f -R 77
*
* Dump created on: Thu Nov 20 11:18:26 2008
*******************************************************************
Error: Premature end of file encountered.

If you like we could send the data file to you.

Regards,

A. Nitzschke

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Alexandra Nitzschke <an(at)clickware(dot)de>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-20 11:33:04
Message-ID:	49254AF0.8020901@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

This is probably a problem with your disk or filesystem. Have you
checked your disks and file system, checked your system logs for disk
errors, made sure your RAID array is in good condition, etc?

You should be able to fix it by REINDEXing the problem index. You can
find out which index it is from pg_catalog, though if you just REINDEX
the table(s) being operated on by the query that should work too.

Consider making a copy of your database and your log files before you
REINDEX in case one of the developers thinks it might actually be caused
by a PostgreSQL bug and wants to have a look.

--
Craig Ringer

From:	Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To:	Alexandra Nitzschke <an(at)clickware(dot)de>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-20 11:36:50
Message-ID:	49254BD2.6010205@usit.uio.no
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Alexandra Nitzschke wrote:
> Hi,
>
> we encountered the following error while inserting a record into a table:
>
> org.postgresql.util.PSQLException: ERROR: could not read block 77 of
> relation 1663/16385/388818775: read only 0 of 8192 bytes
>
> Using postgres 8.3.5
>
> The reported object is an index.
> The size of its data file is 630784 bytes.
>

Hello

You have probably a harware issue and data in your disk is corrupted.

Probably a reindex of this index will fix this problem. But I would take
a backup of this system right away just in case your disk is dying.

regards
--
Rafael Martinez, <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

From:	Alexandra Nitzschke <an(at)clickware(dot)de>
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-20 15:43:28
Message-ID:	492585A0.7090500@clickware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi,

we have had similar postgres problems in the past.
Please have a look at Bug 3484.

We didn't resolve the problems metioned in bug 3484. The other postgres developers also thought, that there are hardware
problems.
So our customer bought a new server with diffrent hardware configuration ( ... and NEW hardware drives ... ).
The error today encountered on the new machine. Just running under heavy load since two days.

We ran two file checks. One normal with no error result.
One deep scan (e2fsck -fcn) searching for corrupted blocks. No error was reported.

It really looks unlikely having a hardware problem now.

Here are some details about the servers:

*NEW* server:
postgres 8.3.5
SUSE 10.3
Kernel 2.6.22.19-0.1-default
Supermicro-Mainboard + 2x AMD Opteron Dual Core 2218 2,60 GHz
4x 1024MB ECC Registered DDR2 Ram ATP
3Ware 9650 4x SATA

*OLD* server:
postgres 8.3.5
SUSE 10.0
Kernel 2.6.22.6-smp
Intel-Mainboard + 2x Intel XEON 2,80 GHz 2MB FSB800
4x 1024MB ECC Registered DDR2 RAM
3Ware Raid Controller 9500S-4

Regards,

A. Nitzschke

Craig Ringer schrieb:
> Alexandra Nitzschke wrote:
>> Hi,
>>
>> we encountered the following error while inserting a record into a table:
>>
>> org.postgresql.util.PSQLException: ERROR: could not read block 77 of
>> relation 1663/16385/388818775: read only 0 of 8192 bytes
>
> This is probably a problem with your disk or filesystem. Have you
> checked your disks and file system, checked your system logs for disk
> errors, made sure your RAID array is in good condition, etc?
>
> You should be able to fix it by REINDEXing the problem index. You can
> find out which index it is from pg_catalog, though if you just REINDEX
> the table(s) being operated on by the query that should work too.
>
> Consider making a copy of your database and your log files before you
> REINDEX in case one of the developers thinks it might actually be caused
> by a PostgreSQL bug and wants to have a look.
>
> --
> Craig Ringer
>
>

From:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To:	Alexandra Nitzschke <an(at)clickware(dot)de>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-20 18:36:33
Message-ID:	4925AE31.90504@postnewspapers.com.au
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Alexandra Nitzschke wrote:
> Hi,
>
> we have had similar postgres problems in the past.
> Please have a look at Bug 3484.
>
> We didn't resolve the problems metioned in bug 3484. The other postgres
> developers also thought, that there are hardware
> problems.
> So our customer bought a new server with diffrent hardware configuration
> ( ... and NEW hardware drives ... ).
> The error today encountered on the new machine. Just running under heavy
> load since two days.

Yes, that does seem somewhat unlikely, especially if in both cases
you've only seen issues with PostgreSQL. However, I'm a bit confused
about the fact that you're seeing apparent corruption all over the place
- your earlier report mentions damaged blocks across a number of
relations, and this one is a bad index. You'd expect this sort of thing
to come up a lot on the list, so it must be assumed that there's
something a bit unusual or different about your configuration that's
either triggering a hard-to-hit bug in PostgreSQL, or that's damaging
PostgreSQL's data somehow.

Is there any chance you have EVER hard-killed the postmaster manually
(eg with "kill -9" or "kill -KILL")? If you do that and don't also kill
the backends, it's my understanding that BAD things may happen
especially if you then attempt to relaunch the postmaster.

Do you use _any_ 3rd party C extensions? Contrib modules? It doesn't
have to be in the same database, another database on the same machine
could be bad too.

Do you have any unusual workload? What is your workload like?

What procedural languages, if any, do you use? Pl/PgSQL? Pl/Perl?
Pl/Java? Pl/Python? etc. Again, in any database, not just your problem
one. If you use any other than Pl/PgSQL please also note the version of
the language interpreter/tools and in the case of Java the JVM vendor &
install method.

Does your site possibly have dodgy power? Are the servers on a UPS?

Have the servers had any crashes, kernel panics, unexpected reboots, or
hard poweroffs?

(Not that it should matter, but): Have you hard killed any backends
(kill -9 / SIGKILL)?

If you run a RAID verify using tw_cli or through the 3dm web interface,
does it report any block mismatches in the array?

--
Craig Ringer

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc:	Alexandra Nitzschke <an(at)clickware(dot)de>, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-20 18:59:09
Message-ID:	4925B37D.1010209@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Craig Ringer wrote:
> Is there any chance you have EVER hard-killed the postmaster manually
> (eg with "kill -9" or "kill -KILL")? If you do that and don't also kill
> the backends, it's my understanding that BAD things may happen
> especially if you then attempt to relaunch the postmaster.

There is safeguards against that. If postmaster dies, the backends
should die quickly and gracefully too. And postmaster refuses to restart
until all the backends have died and detached from the shared memory
segment.

In addition to Craig's question: have you ever experienced sudden power
loss, or operating system crash on these machines? Have you done "kill
-9 postmaster", "pg_ctl stop -m immediate", or similar? PostgreSQL
should recover with no data corruption, of course, but if there's a bug
somewhere, it would help to know where to look.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Alexandra Nitzschke <an(at)clickware(dot)de>, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-20 20:26:49
Message-ID:	7377.1227212809@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Craig Ringer wrote:
>> Is there any chance you have EVER hard-killed the postmaster manually
>> (eg with "kill -9" or "kill -KILL")?

> There is safeguards against that. If postmaster dies, the backends
> should die quickly and gracefully too. And postmaster refuses to restart
> until all the backends have died and detached from the shared memory
> segment.

A sufficiently bull-headed DBA can defeat those safeguards though ---
from memory, manually removing postmaster.pid and then starting a new
postmaster while old backends remain alive is the ticket to data
corruption.

regards, tom lane

From:	Alexandra Nitzschke <an(at)clickware(dot)de>
To:	pgsql-bugs(at)postgresql(dot)org
Cc:	Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-21 14:49:39
Message-ID:	4926CA83.5000304@clickware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi,

here is some information about the server

- no other database system runs on the server
- suse 10.3, standard installation
- jvm 1.5.0_16
- as interface to the database we use jdbc version 8.1-407
- as procdural language we only use pl/pgsql
- we have no specials in the database like custom dataypes an so on

We have had a look at the /var/log files, no system crash, kernel panic or messages like this has happened.

Running a RAID verify reported no error.

We have setup a new database on 2008/11/17 after updating to 8.3.5.
Since then we didn't kill the postmaster manually or any backend process using kill -9.
Just normal stop/start/restart, no manually deletion of postmaster-pid or something like that.

Let me tell the circumstances:

To handle any failures of the system, we build a "pair" of servers.
One server is primary server and one is stand by server.
In case of any system failure we are able to switch the servers and can use the "old" stand by as "new" primary.

For database replication we use the "Warm Standby Using Point-In-Time Recovery" method.

Last Friday I updated postgres to 8.3.5 on the primary server and setup a new database and insert a dump from wednesday
using the pg_restore utility.

This monday I updated postgres to 8.3.5 on the standby server.
After that I intialized the database
( copy once the database from the primary system: removing data/* on stand-by, setting the database on primary in
backup-modus and then copy the database files )
and startet the recovery-mode on standby and the WAL replication from primary.
On Tuesday we have done a switch test and started up the database on the standby and use it as primary server.
Everything works fine until yesterday morning.

Our workload:
Over night we retrieve a lot of data and insert it into the database ( ~300000 - ~700000 recordsets into diffrent tables
but three main tables ).
In the morning we do some processing, but mostly selects.
During the day people work on the webapp on the system, but do mostly selects.
Every evening runs "vacuum analyze".

Regards,

A. Nitzschke

Craig Ringer schrieb:
> Alexandra Nitzschke wrote:
>> Hi,
>>
>> we have had similar postgres problems in the past.
>> Please have a look at Bug 3484.
>>
>> We didn't resolve the problems metioned in bug 3484. The other postgres
>> developers also thought, that there are hardware
>> problems.
>> So our customer bought a new server with diffrent hardware configuration
>> ( ... and NEW hardware drives ... ).
>> The error today encountered on the new machine. Just running under heavy
>> load since two days.
>
> Yes, that does seem somewhat unlikely, especially if in both cases
> you've only seen issues with PostgreSQL. However, I'm a bit confused
> about the fact that you're seeing apparent corruption all over the place
> - your earlier report mentions damaged blocks across a number of
> relations, and this one is a bad index. You'd expect this sort of thing
> to come up a lot on the list, so it must be assumed that there's
> something a bit unusual or different about your configuration that's
> either triggering a hard-to-hit bug in PostgreSQL, or that's damaging
> PostgreSQL's data somehow.
>
> Is there any chance you have EVER hard-killed the postmaster manually
> (eg with "kill -9" or "kill -KILL")? If you do that and don't also kill
> the backends, it's my understanding that BAD things may happen
> especially if you then attempt to relaunch the postmaster.
>
> Do you use _any_ 3rd party C extensions? Contrib modules? It doesn't
> have to be in the same database, another database on the same machine
> could be bad too.
>
> Do you have any unusual workload? What is your workload like?
>
> What procedural languages, if any, do you use? Pl/PgSQL? Pl/Perl?
> Pl/Java? Pl/Python? etc. Again, in any database, not just your problem
> one. If you use any other than Pl/PgSQL please also note the version of
> the language interpreter/tools and in the case of Java the JVM vendor &
> install method.
>
> Does your site possibly have dodgy power? Are the servers on a UPS?
>
> Have the servers had any crashes, kernel panics, unexpected reboots, or
> hard poweroffs?
>
> (Not that it should matter, but): Have you hard killed any backends
> (kill -9 / SIGKILL)?
>
> If you run a RAID verify using tw_cli or through the 3dm web interface,
> does it report any block mismatches in the array?
>
> --
> Craig Ringer
>
>
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexandra Nitzschke <an(at)clickware(dot)de>
Cc:	pgsql-bugs(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-21 17:35:31
Message-ID:	9118.1227288931@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Alexandra Nitzschke <an(at)clickware(dot)de> writes:
> We have had a look at the /var/log files, no system crash, kernel panic or messages like this has happened.

What this smells like is a failed page split --- somewhere in the index
there is a down-link pointing at page 77, but page 77 didn't actually
get added to the file.

If there was no system crash or hardware misfeasance then that'd
represent a Postgres bug, but it's difficult to do much about it
unless you can come up with a reproducible sequence to trigger the bug.

(Also, since you didn't mention otherwise, I assume this is a btree
index?)

regards, tom lane

From:	Alexandra Nitzschke <an(at)clickware(dot)de>
To:	pgsql-bugs(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-21 18:19:44
Message-ID:	4926FBC0.5070506@clickware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Yes, its a btree.

Tom Lane schrieb:
> Alexandra Nitzschke <an(at)clickware(dot)de> writes:
>> We have had a look at the /var/log files, no system crash, kernel panic or messages like this has happened.
>
> What this smells like is a failed page split --- somewhere in the index
> there is a down-link pointing at page 77, but page 77 didn't actually
> get added to the file.
>
> If there was no system crash or hardware misfeasance then that'd
> represent a Postgres bug, but it's difficult to do much about it
> unless you can come up with a reproducible sequence to trigger the bug.
>
> (Also, since you didn't mention otherwise, I assume this is a btree
> index?)
>
> regards, tom lane
>
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alexandra Nitzschke <an(at)clickware(dot)de>
Cc:	pgsql-bugs(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-21 18:26:02
Message-ID:	10343.1227291962@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Alexandra Nitzschke <an(at)clickware(dot)de> writes:
> Yes, its a btree.

Well, the btree code is sufficiently well tested/debugged that I think
there's zero chance of finding such a bug in it just on the suspicion
that there might be one there. We really need a test case.

regards, tom lane

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	Alexandra Nitzschke <an(at)clickware(dot)de>
Cc:	pgsql-bugs(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-22 22:56:15
Message-ID:	87prknz7xs.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Alexandra Nitzschke <an(at)clickware(dot)de> writes:

> This monday I updated postgres to 8.3.5 on the standby server.
> After that I intialized the database
> ( copy once the database from the primary system: removing data/* on stand-by,
> setting the database on primary in backup-modus and then copy the database
> files )

Uhm, just to be sure. You did pg_start_backup() on the primary *before* you
started copying the data files across, right?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning

From:	Alexandra Nitzschke <an(at)clickware(dot)de>
To:	pgsql-bugs(at)postgresql(dot)org
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-24 14:40:48
Message-ID:	492ABCF0.9030807@clickware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Yes, of course.

After we started up the copied database, the system runs withour error for two days.
That means some recordsets has been inserted sucessfully already.

Gregory Stark schrieb:
> Alexandra Nitzschke <an(at)clickware(dot)de> writes:
>
>> This monday I updated postgres to 8.3.5 on the standby server.
>> After that I intialized the database
>> ( copy once the database from the primary system: removing data/* on stand-by,
>> setting the database on primary in backup-modus and then copy the database
>> files )
>
> Uhm, just to be sure. You did pg_start_backup() on the primary *before* you
> started copying the data files across, right?
>

From:	Alexandra Nitzschke <an(at)clickware(dot)de>
To:	pgsql-bugs(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-24 16:55:14
Message-ID:	492ADC72.3080109@clickware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Hi,

first of all I would like to thank you for all your efforts.

> We really need a test case.

unfortunately this kind of bugs tend to be non-reproducable. I assume that there is a race condition which is only hit
in rare cases, under heavy load and when mars and venus are exactly aligned... ;-)

I do not think it is possible to construct a test case that reliably reproduces the bug.

However, we would be glad to incorporate any patches, additional logging code etc. in our installation of postgres that
might help you to track the bug. Since we always build postgres on the production machine this would not be any problem.

Unfortunately we handle very sensitive data in our databases, so we cannot give you direct access to our machines. As a
last resort I would propose the following (provided that our customer agrees):

We set up another machine and feed it with obfuscated data, putting it under the same load as our production machine. We
could then give you root access to that machine and you could do whatever patching, monitoring, testing etc. might be
helpful to track the problem. Do you think this might help?

BTW... how about a block checksum that is checked just before writing a block and just after reading it? I know this
would degrade performance, but I think we can afford that. Would it be possible to incorporate such code without having
to do too much patching?

Thanks in advance

Alexandra Nitzschke
Thomas Goerner

Tom Lane schrieb:
> Alexandra Nitzschke <an(at)clickware(dot)de> writes:
>> Yes, its a btree.
>
> Well, the btree code is sufficiently well tested/debugged that I think
> there's zero chance of finding such a bug in it just on the suspicion
> that there might be one there. We really need a test case.
>
> regards, tom lane
>
>

From:	John R Pierce <pierce(at)hogranch(dot)com>
To:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-24 17:20:13
Message-ID:	492AE24D.7090107@hogranch.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Alexandra Nitzschke wrote:
> BTW... how about a block checksum that is checked just before writing
> a block and just after reading it? I know this would degrade
> performance, but I think we can afford that. Would it be possible to
> incorporate such code without having to do too much patching?

oracle has had an option for some time that uses read/only page
protection for each page of the shared buffer area... when oracle
knows it wants to modify a page, it un-protects it via a system
call. this catches any wild writes into the shared buffer area as a
memory protection fault.

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	John R Pierce <pierce(at)hogranch(dot)com>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-26 09:38:24
Message-ID:	87iqqarfn3.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

John R Pierce <pierce(at)hogranch(dot)com> writes:

> Alexandra Nitzschke wrote:
>> BTW... how about a block checksum that is checked just before writing a block
>> and just after reading it? I know this would degrade performance, but I think
>> we can afford that. Would it be possible to incorporate such code without
>> having to do too much patching?
>
> oracle has had an option for some time that uses read/only page protection for
> each page of the shared buffer area... when oracle knows it wants to modify a
> page, it un-protects it via a system call. this catches any wild writes
> into the shared buffer area as a memory protection fault.

The problem with both of these approaches is that most bugs occur when the
code *thinks* it's doing the right thing. A bug in the buffer management code
which returns the wrong buffer or a real wild pointer dereference. I don't
remember ever having either of those.

That said, the second option seems pretty trivial to implement. I think the
performance would be awful for a live database but for a read-only database it
might make more sense.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

From:	John R Pierce <pierce(at)hogranch(dot)com>
To:	Gregory Stark <stark(at)enterprisedb(dot)com>
Cc:	pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-26 10:06:24
Message-ID:	492D1FA0.8020202@hogranch.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Gregory Stark wrote:
> John R Pierce <pierce(at)hogranch(dot)com> writes:
>
>
>> oracle has had an option for some time that uses read/only page protection for
>> each page of the shared buffer area... when oracle knows it wants to modify a
>> page, it un-protects it via a system call. this catches any wild writes
>> into the shared buffer area as a memory protection fault.
>>
>
> The problem with both of these approaches is that most bugs occur when the
> code *thinks* it's doing the right thing. A bug in the buffer management code
> which returns the wrong buffer or a real wild pointer dereference. I don't
> remember ever having either of those.
>
> That said, the second option seems pretty trivial to implement. I think the
> performance would be awful for a live database but for a read-only database it
> might make more sense.
>

FWIW, it has modest overhead on Oracle on Solaris on Sparc... EXCEPT on
the "Niagra" aka 'Coolthreads' CPUs (the T1 processor), on that it was
horribly slow on our write intensive transactional system. Our
environment is on very large scale servers where the shared buffers are
often 32 or 64GB, I suspect this increases our exposure to bizarro-world
writes.

believe me, especially in earlier Oracle releases (6, 7, 8), this
caught/prevented many problems which otherwise would have ended in a
Oracle fatal Block Corruption error, which would require many hours of
DBA hackery before the database could be restarted.

From:	Marc Schablewski <ms(at)clickware(dot)de>
To:	John R Pierce <pierce(at)hogranch(dot)com>
Cc:	Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-27 09:49:57
Message-ID:	492E6D45.5090703@clickware.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

I think both approaches (checksum and write protection) might contribute
to finding this bug. If pages with bogus data but correct checksum are
ever found on disk, I think this would prove that there is no hardware /
file system / os issue.

If an access violation resulting from writes to locked pages were hit,
would it be possible to log a stack backtrace?

Especially on our test systems we can easily afford any performance
degradations resulting from this.

Question: Who is responsible for maintaining this part (buffer cache
maintenance, writer etc) of postgres code?
Could you provide the necessary patches?

Thanks in advance

Thomas Goerner
Marc Schablewski

John R Pierce wrote:
> Gregory Stark wrote:
>> John R Pierce <pierce(at)hogranch(dot)com> writes:
>>
>>
>>> oracle has had an option for some time that uses read/only page
>>> protection for
>>> each page of the shared buffer area... when oracle knows it wants
>>> to modify a
>>> page, it un-protects it via a system call. this catches any wild
>>> writes
>>> into the shared buffer area as a memory protection fault.
>>>
>>
>> The problem with both of these approaches is that most bugs occur
>> when the
>> code *thinks* it's doing the right thing. A bug in the buffer
>> management code
>> which returns the wrong buffer or a real wild pointer dereference. I
>> don't
>> remember ever having either of those.
>>
>> That said, the second option seems pretty trivial to implement. I
>> think the
>> performance would be awful for a live database but for a read-only
>> database it
>> might make more sense.
>>
>
>
> FWIW, it has modest overhead on Oracle on Solaris on Sparc... EXCEPT
> on the "Niagra" aka 'Coolthreads' CPUs (the T1 processor), on that it
> was horribly slow on our write intensive transactional system. Our
> environment is on very large scale servers where the shared buffers
> are often 32 or 64GB, I suspect this increases our exposure to
> bizarro-world writes.
>
> believe me, especially in earlier Oracle releases (6, 7, 8), this
> caught/prevented many problems which otherwise would have ended in a
> Oracle fatal Block Corruption error, which would require many hours of
> DBA hackery before the database could be restarted.
>
>
>

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Marc Schablewski <ms(at)clickware(dot)de>
Cc:	John R Pierce <pierce(at)hogranch(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject:	Re: could not read block 77 of relation 1663/16385/388818775
Date:	2008-11-27 10:22:08
Message-ID:	492E74D0.90501@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-bugs

Marc Schablewski wrote:
> If pages with bogus data but correct checksum are
> ever found on disk, I think this would prove that there is no hardware /
> file system / os issue.

No, it would only suggest that the issue is not in the filesystem or I/O
subsystem. Even then, it wouldn't catch bugs where the contents of one
block are copied over or swapped with another block. The checksum would
be calculated when a page is written to disk, so the corruption could
still be caused by faulty memory, memory bus, CPU or OS, while the page
sits in the buffer cache.

> If an access violation resulting from writes to locked pages were hit,
> would it be possible to log a stack backtrace?

I think you'd get a segmentation fault. With a core dump if the system
is configured so.

> Question: Who is responsible for maintaining this part (buffer cache
> maintenance, writer etc) of postgres code?

There's no named individuals, just the community in general.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com