Re: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid

Lists: pgsql-bugs
From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid
Date: 2011-02-22 06:43:58
Message-ID: AANLkTintLoKYsowSW5XiUW_eaX-xCGe+SvyrdzDR1HE3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

A medium-length story short, this query returns non-zero:

select count(distinct typnamespace) from pg_type where not exists
(select 1 from pg_namespace where oid = pg_type.typnamespace);

I did a very brief search in all the release notes for 8.3.5 to
8.3.14, but have not found precisely what I was looking for (searches
for namespace, schema, type, and corruption).

This was discovered when performing a pg_dump of this user's database,
whereby pg_dump complained when trying to dump types for lack of a
good catalog entry to nab the namespace name from. In our case, two
namespaces seem to be affected. The user of this database was never
privileged enough to even perform CREATE SCHEMA, to my knowledge, and
in fact only have the schema (owned by the postgres superuser) that
they began with.

Is it safe to perform an UPDATE on pg_type to give entries a valid
typnamespace? Is there any forensic evidence I can grab before doing
that to assist in figuring out the mechanism for this bug, if
applicable?

Cheers.

--
fdr


From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid
Date: 2011-02-22 06:46:45
Message-ID: AANLkTinN1=uth3zppMeUjetWev6gRKLWSGj8mt4jQyLZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Feb 21, 2011 at 10:43 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> A medium-length story short, this query returns non-zero:
>
> select count(distinct typnamespace) from pg_type where  not exists
> (select 1 from pg_namespace where oid = pg_type.typnamespace);
>
> I did a very brief search in all the release notes for 8.3.5 to
> 8.3.14, but have not found precisely what I was looking for (searches
> for namespace, schema, type, and corruption).

It may also be useful information to know that no recent shenanigans
have happened on this server: it's been up continuously for about 500
days. That doesn't mean something interesting did not occur a very
long time ago, and I'm currently asking around for any notes about
interesting things that have occurred on this machine.

--
fdr


From: Daniel Farina <daniel(at)heroku(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid
Date: 2011-02-22 08:48:57
Message-ID: AANLkTi=jba6ZSkV5_kuJvDezv9yZhSEDser1=5AwSPoM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Feb 21, 2011 at 10:46 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> It may also be useful information to know that no recent shenanigans
> have happened on this server: it's been up continuously for about 500
> days. That doesn't mean something interesting did not occur a very
> long time ago, and I'm currently asking around for any notes about
> interesting things that have occurred on this machine.

From what I can tell, people only see this problem with pg_dump, which
is interesting. This symptom has a very long history:

http://archives.postgresql.org/pgsql-general/2004-02/msg00970.php
http://archives.postgresql.org/pgsql-admin/2006-10/msg00192.php
http://archives.postgresql.org/pgsql-bugs/2005-11/msg00305.php
http://archives.postgresql.org/pgsql-bugs/2010-01/msg00087.php
http://archives.postgresql.org/pgsql-general/2011-02/msg00334.php

Something I'm not sure any of these mention that's very interesting in
my case that may be crucial information:

In my case, there are two "missing" pg_namespace entries, and both
have the same missing relations. Both of them have "credible" looking
OIDs (in the hundreds of thousands, and one after the other) as well
as "credible" looking ancillary information:

* all owners are correct

* there are exactly four relfrozenxid values. They look like this:

SELECT distinct c.relnamespace, relfrozenxid::text
FROM pg_class c
LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_tablespace t ON t.oid = c.reltablespace
WHERE nspname IS NULL;
relnamespace | relfrozenxid
--------------+--------------
320204 | 0
320204 | 6573962
320527 | 0
320527 | 6574527

Note that relfrozenxic increases along with the oid, which is
generally what you'd expect. Some relations have no frozen xid.

* This is affecting the following features the user has used:
sequences, relations, indexes (in this case, they are all _pkey
indexes)

* There's also a valid version of all these relations/objects that
*are* connected to the schema that's alive and expected. As such, \dt,
\dn seem to work as one would expect. The modern namespace OID is
378382, which is in line with a smooth monotonic increase over time.

* Each relkind has its own relfilenode, and they all do appear to
exist in the cluster directory. I didn't spot any big ones from a
random sampling (I can write a comprehensive one on request), but some
were 8KB and some were 16KB, which might suggest that some data is in
some of them.

More forensics tomorrow.

Sadly, for whatever reason, pg_dump --schema=public didn't seem to
help me out. We do need a workaround if we wish to keep doing
forensics.

--
fdr


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid
Date: 2011-02-22 11:30:52
Message-ID: AANLkTinwCnZPCXi0DghH7Jw2By+iomSZhSv+ux2-2mTr@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Feb 22, 2011 at 8:48 AM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
>  The modern namespace OID is
> 378382, which is in line with a smooth monotonic increase over time.

Wait, what? namespace OID is the OID of the schema. The OID of an
object doesn't change over the lifetime of the object, it's a unique
identifier. So the only reason this would be increasing like this
would be if you're creating schemas continually over time. What
actually is going on in this database?

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid
Date: 2011-02-22 16:54:30
Message-ID: 1652.1298393670@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Daniel Farina <daniel(at)heroku(dot)com> writes:
> From what I can tell, people only see this problem with pg_dump, which
> is interesting. This symptom has a very long history:

Yeah. There seems to be some well-hidden bug whereby dropping an object
sometimes fails to drop (some of?) its dependencies. I'm still looking
for a reproducible case, or even a hint as to what the trigger condition
might be.

> In my case, there are two "missing" pg_namespace entries, and both
> have the same missing relations.

Uh, what do you mean by "same missing relations"?

> * There's also a valid version of all these relations/objects that
> *are* connected to the schema that's alive and expected.

And this isn't making any sense to this onlooker, either. Could you
provide a more detailed explanation of the usage pattern in this
database? I speculate that what you mean is the user periodically
drops and recreates a schema + its contents, but please be explicit.

> Sadly, for whatever reason, pg_dump --schema=public didn't seem to
> help me out. We do need a workaround if we wish to keep doing
> forensics.

Yeah, pg_dump is written to glom onto everything listed in the catalogs
and sort it out later. So it tends to notice inconsistencies that you
might not notice in regular usage of the database. It's sort of hard to
avoid, since for example a --schema switch depends on seeing which
objects belong to which schema ...

regards, tom lane


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: 8.3.5: Types with typnamespace pointing at non-existent pg_namespace oid
Date: 2011-02-22 18:32:53
Message-ID: AANLkTikcDk_ge-D5g0mMCKaN2X_D=skO7xdToR8fPzvL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Feb 22, 2011 at 8:54 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Daniel Farina <daniel(at)heroku(dot)com> writes:
>> From what I can tell, people only see this problem with pg_dump, which
>> is interesting. This symptom has a very long history:
>
> Yeah.  There seems to be some well-hidden bug whereby dropping an object
> sometimes fails to drop (some of?) its dependencies.  I'm still looking
> for a reproducible case, or even a hint as to what the trigger condition
> might be.
>
>> In my case, there are two "missing" pg_namespace entries, and both
>> have the same missing relations.
>
> Uh, what do you mean by "same missing relations"?

There are an identical set of relations (including quasi-relations
like indexes and sequences) with relnames and most other properties
that are identical between the versions that are tied with each of the
two missing namespaces. There's also a superset of those (but that may
be partially or totally explained by the current set being more recent
as the application as grown) that are seen with a normal looking
pg_namespace record. All three copies of these formations seem to have
very sensible pg_class/pg_type/pg_sequence formations in their
respective relnamespaces.

>> * There's also a valid version of all these relations/objects that
>> *are* connected to the schema that's alive and expected.
>
> And this isn't making any sense to this onlooker, either.  Could you
> provide a more detailed explanation of the usage pattern in this
> database?  I speculate that what you mean is the user periodically
> drops and recreates a schema + its contents, but please be explicit.

We run quite a large number of databases, and I unfortunately think
that this particular fault has occurred in what could be called
ancient history, as far as log retention is concerned. Sadly our
investigation will have to be limited to what we can find at this
time, although we can probably slowly work our way to being able to
catch this one in the act. We might also be able to run a catalog
query across other databases to get a sense as to the frequency of the
problem.

It may be worth noting in this case that the user does not own the
schema that is thought to be dropped (or, in fact, any schemas at
all), so DROP SCHEMA as issued by them is not likely a culprit. I will
ask around as to what administrative programs we possess that might
fool with the schema. Still, such a program is probably run many times
across many databases. This is why I'm scratching my head about the
fact that two sets of such bogus relnamespace references were
produced.

Although I have no idea how such a thing could happen, is it possible
that both copies come from one occurrence of the bug?

> Yeah, pg_dump is written to glom onto everything listed in the catalogs
> and sort it out later.  So it tends to notice inconsistencies that you
> might not notice in regular usage of the database.  It's sort of hard to
> avoid, since for example a --schema switch depends on seeing which
> objects belong to which schema ...

I figured as much, although if it were written slightly differently
(starting from oid where nspname = 'public') then perhaps it would not
run into problems. I was meaning to poke at pg_depend to see if
anything interesting can be seen in there.

I'll probably hack up pg_dump to try to step around the yucky
relations so we can ensure that this database gets a clean-looking
restore elsewhere before we put the strange-looking database on ice --
permanently, if you think there is no value in having it around.

--
fdr