PD_ALL_VISIBLE flag warning

Lists: pgsql-admin
From: Jonathan Foy <thefoy(at)gmail(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: PD_ALL_VISIBLE flag warning
Date: 2010-04-01 13:10:39
Message-ID: g2o4b46b5f01004010610ib8625426uae6ee90ac1435ba1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hello

I came in this morning and noticed this warning sitting in my inbox quite a
few times...

WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
page 471118
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
page 471119
WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
page 471120
....

and I'm wondering how worked up I should be getting.

I'm trying to do due diligence on my side and am digging through archives,
but just in case it's a cause for greater alarm than I think it is I'm going
to throw this out there and see what those more experienced that I say.

Postgres version 8.4.1, redhat 5.3.

The table in question is about 6 million rows, a single partition in a
larger set. A relatively heavy load (mass insert followed by a mass update)
takes place each morning, during the time period where I was seeing this
error. Auto-vacuum apparently kicked off at this time as well (auto-analyze
is showing last-run at 4:27 AM, auto-vacuum at 8:48). The first warning
kicked off at 4:55.

My assumption here is that the load from the inserts/updates was interfering
with auto-vacuum in some way, and resulted in the above warnings. Is this a
reasonable assumption at all? I'm under the impression that auto-vacuum
will kill itself if it notices that it's hurting performance, but I wouldn't
think it should throw all of these warnings...does it indicate poor
auto-vacuum configuration? I admit that said configuration is an area that
I've been meaning to look into more (upgraded from 8.1 where we didn't use
auto-vacuum), but hasn't ended up on the top of the priorities list yet.

Let me know if I'm not providing enough info.


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Jonathan Foy <thefoy(at)gmail(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: PD_ALL_VISIBLE flag warning
Date: 2010-04-08 04:59:12
Message-ID: t2u407d949e1004072159yaf03feffu1c18e03d75859bba@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Thu, Apr 1, 2010 at 2:10 PM, Jonathan Foy <thefoy(at)gmail(dot)com> wrote:
> Hello
>
> I came in this morning and noticed this warning sitting in my inbox quite a
> few times...
>
> WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
> page 471118
> WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
> page 471119
> WARNING:  PD_ALL_VISIBLE flag was incorrectly set in relation "table_2010q1"
> page 471120
> ....
>
> and I'm wondering how worked up I should be getting.

This is a warning about a fairly serious data corruption problem. In
the worst case prior queries returned incorrect results -- which if
those results were used to calculate subsequent updates could have
caused your data to be inconsistent.

Specifically, having that flag set would cause rows to be returned
which should not have been returned, either because they had been
deleted, the transaction inserting them aborted, or they had been
updated (causing duplicate copies of the same row to be returned).

There was a bug which could cause this in 8.4.0 but it was fixed in
8.4.1. You say you're running 8.4.1 (you should be running 8.4.3
though none of the subsequent bug-fixes are as serious if the SSL
security vulnerability isn't relevant) so this is quite worrisome.
Nobody else has reported anything similar though.

Is it possible these pages have never been vacuumed since you updated
from 8.4.0 to 8.4.1? I'm not sure how likely that is. In 8.4 Postgres
tries to avoid vacuuming pages that don't need it so it's possible
this situation could live on for longer than in the past.

I think a regular "VACUUM" of every table should guarantee that any
remnants of this bug from 8.4.0 are cleaned up. If you do this and
possibly get some more warnings from it but never see it again
afterwards then I would assume they're leftovers from that old bug.

--
greg


From: Jonathan Foy <thefoy(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: PD_ALL_VISIBLE flag warning
Date: 2010-04-08 12:57:00
Message-ID: g2v4b46b5f01004080557j25cb26f1ic054f92b4f543baa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Thanks for the reply...

We have an unhealthy habit of not updating very often around here, one that
I'm trying to fix. We run both 8.1 and 8.4, and we're testing the update to
8.1.20 this weekend....from 8.1.3. Across the board 8.4 update will take
much more testing due to the typing issue. Other 8.4 updates will take place
soon.

Anyway, my guilt about outdated versions aside...

We never actually ran 8.4.0. We went straight from 8.3.3 to 8.4.1. I did a
full dump/restore from the 8.3.3 database into the 8.4.1 database (using the
8.4 pg_dump), partitioning as I went. The table in question was not
partitioned in 8.3, the 8.4 dump/restore seemed as good a place to do the
partitioning as any. I don't specifically remember doing it, but I almost
always do a manual vacuum/analyze after a restore. Autovacuum is/was also
turned on. According to the stats, everything seems to have been at least
vacuumed on 2/15 (the date of the update I believe), so it is doubtful to me
that any of the tables, especially the currently active partition, would
have gone un-vacuumed as of 4/1.

The logs for that day have already rotated into less-easily available
storage, but I'm considering retrieving them to look for any
aborts/rollbacks.

Also, for what it's worth, we never delete rows from this table, we only
insert and update.

I have not seen the warning since.

Sound like trouble?

On Thu, Apr 8, 2010 at 12:59 AM, Greg Stark <gsstark(at)mit(dot)edu> wrote:

> On Thu, Apr 1, 2010 at 2:10 PM, Jonathan Foy <thefoy(at)gmail(dot)com> wrote:
> > Hello
> >
> > I came in this morning and noticed this warning sitting in my inbox quite
> a
> > few times...
> >
> > WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation
> "table_2010q1"
> > page 471118
> > WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation
> "table_2010q1"
> > page 471119
> > WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation
> "table_2010q1"
> > page 471120
> > ....
> >
> > and I'm wondering how worked up I should be getting.
>
> This is a warning about a fairly serious data corruption problem. In
> the worst case prior queries returned incorrect results -- which if
> those results were used to calculate subsequent updates could have
> caused your data to be inconsistent.
>
> Specifically, having that flag set would cause rows to be returned
> which should not have been returned, either because they had been
> deleted, the transaction inserting them aborted, or they had been
> updated (causing duplicate copies of the same row to be returned).
>
> There was a bug which could cause this in 8.4.0 but it was fixed in
> 8.4.1. You say you're running 8.4.1 (you should be running 8.4.3
> though none of the subsequent bug-fixes are as serious if the SSL
> security vulnerability isn't relevant) so this is quite worrisome.
> Nobody else has reported anything similar though.
>
> Is it possible these pages have never been vacuumed since you updated
> from 8.4.0 to 8.4.1? I'm not sure how likely that is. In 8.4 Postgres
> tries to avoid vacuuming pages that don't need it so it's possible
> this situation could live on for longer than in the past.
>
> I think a regular "VACUUM" of every table should guarantee that any
> remnants of this bug from 8.4.0 are cleaned up. If you do this and
> possibly get some more warnings from it but never see it again
> afterwards then I would assume they're leftovers from that old bug.
>
> --
> greg
>


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Jonathan Foy <thefoy(at)gmail(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: PD_ALL_VISIBLE flag warning
Date: 2010-04-08 14:16:41
Message-ID: t2j407d949e1004080716iff21f326p4cd52c79ed204359@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Thu, Apr 8, 2010 at 1:57 PM, Jonathan Foy <thefoy(at)gmail(dot)com> wrote:
> Thanks for the reply...
>
> We have an unhealthy habit of not updating very often around here, one that
> I'm trying to fix. We run both 8.1 and 8.4, and we're testing the update to
> 8.1.20 this weekend....from 8.1.3.  Across the board 8.4 update will take
> much more testing due to the typing issue. Other 8.4 updates will take place
> soon.

If you're on 8.1 it's important to be on the latest 8.1 -- there are
at least 17 significant bugs that warranted bug-fix releases between
8.1.3 and 8.1.20 including security bugs and data corruption bugs.

> We never actually ran 8.4.0.  We went straight from 8.3.3 to 8.4.1.  I did a
> full dump/restore from the 8.3.3 database into the 8.4.1 database (using the
> 8.4 pg_dump), partitioning as I went.

Ugh, then it sounds like there's a real as-yet unidentified problem.

The other thing to check before panicking is, has this server ever run
with fsync off and crashed (either kernel panic or power failure)?

What does the storage system on this server look like? Are they IDE or
SCSI drives? Is there an LVM layer involved? etc.

--
greg


From: Jonathan Foy <thefoy(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: PD_ALL_VISIBLE flag warning
Date: 2010-04-08 15:03:14
Message-ID: j2y4b46b5f01004080803yc018a6cbqf38ceedd6933817d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Thu, Apr 8, 2010 at 10:16 AM, Greg Stark <gsstark(at)mit(dot)edu> wrote:

>
> If you're on 8.1 it's important to be on the latest 8.1 -- there are
> at least 17 significant bugs that warranted bug-fix releases between
> 8.1.3 and 8.1.20 including security bugs and data corruption bugs.
>

Understood. It's my current top priority.

>
> > We never actually ran 8.4.0. We went straight from 8.3.3 to 8.4.1. I
> did a
> > full dump/restore from the 8.3.3 database into the 8.4.1 database (using
> the
> > 8.4 pg_dump), partitioning as I went.
>
> Ugh, then it sounds like there's a real as-yet unidentified problem.
>
> The other thing to check before panicking is, has this server ever run
> with fsync off and crashed (either kernel panic or power failure)?
>

I have never turned off fsync. Also, the server is showing an uptime of 281
days, and the postgres instance has been up since 2/21.

>
> What does the storage system on this server look like? Are they IDE or
> SCSI drives? Is there an LVM layer involved? etc.
>
> --
> greg
>

Storage is on an NFS mounted Fiber Netapp volume.

Let me know if you need more...