Re: Proposal: Integrity check

Lists: pgsql-hackers
From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Robert Mach <mach(dot)robert(at)gmail(dot)com>
Subject: Proposal: Integrity check
Date: 2008-01-25 16:56:34
Message-ID: 479A14C2.7020203@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Regarding to Robert Mach's work during Google SOC on data integrity
check. I would like to improve storage module and implement some
Robert's code into the core.

I would like to make following modification:

1) Add ReadBuffer_noerror (recommend me better name) function which will
accept damaged page without Error. This page will be marked as corrupted
and when ReadBuffer will touch this page then it will be handled in
standard way.

This is important for check and repair functions to process all table
without interruption.

2) Extend PageHeaderIsValid function reporting.

a) split one condition to more and report each problem separately
b) check linp if they point to correct place (occupied space)
c) check overlaying

Because some tests are time expensive, IntegrityCheckLevel variable will
specify how many test will be performed.

3) Add PageHeaderIsValid check also for write operation

In production it should catch problem with memory or software bugs. In
development it should catch memory overwriting.

4) Add special command (CHECK/VERIFY) or function which validates table
or whole database.

Any comments?

Thanks Zdenek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Mach <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-01-25 17:27:22
Message-ID: 2221.1201282042@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> I would like to make following modification:

> 1) Add ReadBuffer_noerror (recommend me better name) function which will
> accept damaged page without Error. This page will be marked as corrupted
> and when ReadBuffer will touch this page then it will be handled in
> standard way.

This seems like a pretty horrid idea. Bad pages shouldn't be allowed to
get into shared buffers in the first place. Why not have the checking
logic operate outside shared buffers?

> 3) Add PageHeaderIsValid check also for write operation

> In production it should catch problem with memory or software bugs. In
> development it should catch memory overwriting.

Is there any evidence whatsoever to demonstrate that this is worth the
cycles it will eat?

regards, tom lane


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Mach <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-01-25 18:03:10
Message-ID: 479A245E.9080506@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
>> I would like to make following modification:
>
>> 1) Add ReadBuffer_noerror (recommend me better name) function which will
>> accept damaged page without Error. This page will be marked as corrupted
>> and when ReadBuffer will touch this page then it will be handled in
>> standard way.
>
> This seems like a pretty horrid idea. Bad pages shouldn't be allowed to
> get into shared buffers in the first place. Why not have the checking
> logic operate outside shared buffers?

It currently works outside the shared buffers, but I afraid about
collision due to parallel read and write access on one block. I'm not
sure if parallel write(8k) and read(8k) is synchronized by kernel/fs or
not. If not it should generates false positive results. If yes than I'm
happy :-) with outside processing.

>> 3) Add PageHeaderIsValid check also for write operation
>
>> In production it should catch problem with memory or software bugs. In
>> development it should catch memory overwriting.
>
> Is there any evidence whatsoever to demonstrate that this is worth the
> cycles it will eat?

Alex from clickware tries this modification to catch their problem with
random damaged database. But, I don't have any result yet.

Zdenek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Mach <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-01-25 18:54:36
Message-ID: 3479.1201287276@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> Tom Lane wrote:
>> This seems like a pretty horrid idea. Bad pages shouldn't be allowed to
>> get into shared buffers in the first place. Why not have the checking
>> logic operate outside shared buffers?

> It currently works outside the shared buffers, but I afraid about
> collision due to parallel read and write access on one block. I'm not
> sure if parallel write(8k) and read(8k) is synchronized by kernel/fs or
> not. If not it should generates false positive results. If yes than I'm
> happy :-) with outside processing.

We're already assuming that; otherwise base backups for PITR don't work.

regards, tom lane


From: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Robert Mach" <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-01-25 20:01:59
Message-ID: E1539E0ED7043848906A8FF995BDA57902B63125@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> >> This seems like a pretty horrid idea. Bad pages shouldn't be
allowed to
> >> get into shared buffers in the first place. Why not have the
checking
> >> logic operate outside shared buffers?
>
> > It currently works outside the shared buffers, but I afraid about
> > collision due to parallel read and write access on one block. I'm
not
> > sure if parallel write(8k) and read(8k) is synchronized by kernel/fs
or
> > not. If not it should generates false positive results. If yes than
I'm
> > happy :-) with outside processing.
>
> We're already assuming that; otherwise base backups for PITR
> don't work.

I think we could, but iirc we did not. We do not need that assumption if
you don't
turn off fullpage writes. All pages that could potentially be changed
during the
backup exist in the WAL fullpages that have to be replayed.
Don't we even now allways write fullpages to WAL during backup exactly
because we
did not confirm that assumption ?

I think I recall times when some OS had trouble with this when you mixed
O_DIRECT (or was it also O_DATASYNC) and normal IO.

Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
Cc: "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Robert Mach" <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-01-25 20:23:58
Message-ID: 15152.1201292638@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at> writes:
>> We're already assuming that; otherwise base backups for PITR
>> don't work.

> I think we could, but iirc we did not. We do not need that assumption if
> you don't
> turn off fullpage writes.

Oh, I had forgotten that RestoreBkpBlocks restores unconditionally.
Good point.

It's still the case that there's no need to allow pages into shared
buffers that don't pass the header-is-valid checks. If they don't
pass, then there's no way that bufmgr has a conflicting copy.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Zdenek Kotala" <Zdenek(dot)Kotala(at)Sun(dot)COM>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Robert Mach" <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-01-25 20:58:43
Message-ID: 8763xh62rg.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
>> Tom Lane wrote:
>>> This seems like a pretty horrid idea. Bad pages shouldn't be allowed to
>>> get into shared buffers in the first place. Why not have the checking
>>> logic operate outside shared buffers?
>
>> It currently works outside the shared buffers, but I afraid about
>> collision due to parallel read and write access on one block. I'm not
>> sure if parallel write(8k) and read(8k) is synchronized by kernel/fs or
>> not. If not it should generates false positive results. If yes than I'm
>> happy :-) with outside processing.
>
> We're already assuming that; otherwise base backups for PITR don't work.

Don't full page writes protect us? I thought you had to start applying logs
forward from the point the backup started. Should we be forcing full page
writes during the base backup, perhaps treating the start of the base backup
as if it was a checkpoint as far as triggering full page writes?

I have little confidence that write(2) and read(2) calls are synchronized in
this manner when postgres's page size is larger than the filesystem block
size. Certainly I doubt someone issuing a write(2) of a few megabytes is going
to stop another process from being able to issue reads against the same file
until those megabytes are all in kernel cache, let alone if they overflow
cache and force i/o.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Mach <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-01-25 22:13:52
Message-ID: 1201299232.4257.528.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2008-01-25 at 17:56 +0100, Zdenek Kotala wrote:
> Regarding to Robert Mach's work during Google SOC on data integrity
> check. I would like to improve storage module and implement some
> Robert's code into the core.
>
> I would like to make following modification:
>
> 1) Add ReadBuffer_noerror (recommend me better name) function which will
> accept damaged page without Error. This page will be marked as corrupted
> and when ReadBuffer will touch this page then it will be handled in
> standard way.
>
> This is important for check and repair functions to process all table
> without interruption.

We shouldn't let duff data into shared buffers at all.

I think you could mix the two methods of reading buffers

- start a subtransaction
- read blocks into shared buffers
- if error, then re-read block into private memory and examine
- carry on thru table in a new subtransaction

OK with other points, except I don't want a new command. Let's do it as
a function that can accept block ranges to check, not just whole tables.
e.g. pg_check_blocks(17, 43) would check blocks 17 -> 43

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Mach <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-01-28 12:16:12
Message-ID: 479DC78C.6030608@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Fri, 2008-01-25 at 17:56 +0100, Zdenek Kotala wrote:
>> Regarding to Robert Mach's work during Google SOC on data integrity
>> check. I would like to improve storage module and implement some
>> Robert's code into the core.
>>
>> I would like to make following modification:
>>
>> 1) Add ReadBuffer_noerror (recommend me better name) function which will
>> accept damaged page without Error. This page will be marked as corrupted
>> and when ReadBuffer will touch this page then it will be handled in
>> standard way.
>>
>> This is important for check and repair functions to process all table
>> without interruption.
>
> We shouldn't let duff data into shared buffers at all.

As Tom mentioned before. I agree, it could cause a lot of problems.

> I think you could mix the two methods of reading buffers
>
> - start a subtransaction
> - read blocks into shared buffers
> - if error, then re-read block into private memory and examine
> - carry on thru table in a new subtransaction

It seems like good idea.

> OK with other points, except I don't want a new command. Let's do it as
> a function that can accept block ranges to check, not just whole tables.
> e.g. pg_check_blocks(17, 43) would check blocks 17 -> 43

It makes sense. I think following function should cover all cases:

pg_check_blocks() - all db
pg_check_blocks(relno) - all relation
pg_check_blocks(relno, start, stop) - selected interval
pg_check_blocks(relno, array of blocks) - selected blocks

Zdenek


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Mach <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-05-12 17:19:17
Message-ID: 20080512171917.GO5655@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala escribió:
> Regarding to Robert Mach's work during Google SOC on data integrity
> check. I would like to improve storage module and implement some
> Robert's code into the core.

Did this go anywhere?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Mach <mach(dot)robert(at)gmail(dot)com>
Subject: Re: Proposal: Integrity check
Date: 2008-05-12 19:12:12
Message-ID: 4828968C.1010909@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera napsal(a):
> Zdenek Kotala escribió:
>> Regarding to Robert Mach's work during Google SOC on data integrity
>> check. I would like to improve storage module and implement some
>> Robert's code into the core.
>
> Did this go anywhere?
>

I did not catch May commit fest :(. I plan to send core improvement to next
commit fest.

Zdenek