Missing tables in postgresql 7.2.4

Lists: pgsql-bugs
From: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
To: pgsql-bugs(at)postgresql(dot)org
Cc: mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE
Subject: Missing tables in postgresql 7.2.4
Date: 2005-05-11 14:59:43
Message-ID: 1050511165943.ZM11952@TechFak.Uni-Bielefeld.DE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

we recently discovered on our production database an a little bit bizarre
problem (after two years stable operations). Some tables are simply missing, or
sometimes the affected table(s) is/are there but not listed in pg_tables.

An example (I am looking for the table kog_blasthit_tables):

dev_db=# select version();
version
-------------------------------------------------------------------------
PostgreSQL 7.2.4 on sparc-sun-solaris2.8, compiled by GCC gcc (GCC) 3.1
(1 row)

dev_db=# \d kog_ <TAB EXPANSION>
kog_blasthits_template kog_hits_obj_2374_q411_db7
kog_hits_obj_1016_q344_db7 kog_hits_obj_2396_q833_db7
kog_hits_obj_1341_q475_db7 kog_hits_obj_2491_q505_db7
kog_hits_obj_1362_q412_db7 kog_hits_obj_2516_q900_db7
kog_hits_obj_1364_q413_db7 kog_hits_obj_2559_q78_db7
kog_hits_obj_1584_q570_db7 kog_hits_obj_822_q369_db7
kog_hits_obj_1604_q574_db7 kog_hits_obj_834_q371_db7
kog_hits_obj_1660_q608_db7 kog_hits_obj_846_q375_db7
kog_hits_obj_1725_q650_db7 kog_hits_obj_880_q339_db7
kog_hits_obj_1737_q605_db7 kog_info
kog_hits_obj_2186_q686_db7 kog_org

// No kog_blasthit_tables !!!

BUT it seems to be there:

dev_db=# select * FROM kog_blasthit_tables LIMIT 1;
id | object_id | query_set_id | db_set_id | num_of_query_sequences |
table_name | table_desc | number_of_entries |
blast_parameters | blast_prg | evalue_cutoff | owner
| project | created | status |
number_of_unique_entries | reciprocal_first_id | reciprocal_second_id
-----+-----------+--------------+-----------+------------------------+----------------------------+------------+-------------------+-------------------------------------------+-----------+---------------+----------+---------+-------------------------------+----------+--------------------------+---------------------+----------------------
398 | 1341 | 475 | 7 | 26187 |
kog_hits_obj_1341_q475_db7 | na | 0 | -a 1 -m 7 -T F -F
T -p blastx -e 0.100000 | kogblast | 0.1 | mbeckste | 23 |
2004-02-26 18:36:48.048881+01 | finished | 27426 |
-1 | -1
(1 row)

dev_db=#

// The table seems to be there but not in pg_tables

dev_db=# select * FROM pg_tables WHERE tablename='kog_blasthit_tables';
tablename | tableowner | hasindexes | hasrules | hastriggers
-----------+------------+------------+----------+-------------
(0 rows)

dev_db=# \d pg_ta <TAB EXPANSION>
// nothing but

dev_db=# select count(*) FROM pg_tables;
count
-------
875
(1 row)

The case which is even worth, is that the table is missing completely. We have
a catalog table that keeps track of generated tables, so we know that it has to
be there. The postmaster logs (with high debug level) show no DROP TABLE
between the time of the CREATE TABLE statement of the affected table and the
time where we noticed that it is missing.

I know that this is probably not enough information for a precise analysis. May
be someone can give me a hint how to further investigate this issue.

regards
Michael

--

------------------------------------------------------------------------------
Dipl.-Inform. Michael Beckstette Office: M3-129
AG-PI / Technische Fakultaet EMail:mbeckste(at)techfak(dot)uni-bielefeld(dot)de
Universitaet Bielefeld Fon: +49-521-106-2914
Postfach 100131 Fax: +49-521-106-6411
D-33501 BIELEFELD
Germany


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-11 15:51:45
Message-ID: 4527.1115826705@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE> writes:
> we recently discovered on our production database an a little bit bizarre
> problem (after two years stable operations). Some tables are simply missing, or
> sometimes the affected table(s) is/are there but not listed in pg_tables.

This sounds a bit like a transaction ID wraparound problem. Have you
been vacuuming your whole database on a reasonable schedule? The
missing tables might conceivably be old enough that their pg_class rows
have wrapped around "into the future". It'd be useful to look at
SELECT datname, age(datfrozenxid) FROM pg_database;

regards, tom lane


From: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-11 16:11:41
Message-ID: 1050511181141.ZM13388@TechFak.Uni-Bielefeld.DE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi Tom,

this is the output from 'SELECT datname, age(datfrozenxid) FROM pg_database;'

datname | age
-------------+-------------
xgc | -1950241750
dev_db | -1886587214
template1 | -1884294460
template0 | -1884294460
promo_db | -1884294460
snap_db_new | -1884294460
gendev_db | 1887538137
(7 rows)

dev_db=#

The affected DB is 'dev_db', although it looks like the others except
'gendev_db' have a wraparound problem too (?). To answer your question about
the VACUUM: We VACUUM FULL a few tables with a high amount of INSERT/DEL
operations once per hour, but I have to admit that we perform a VACUUM of the
whole DB not on a regulary basis. I think the last one was several monthes ago.
Further on we use transactions at several places and we have at least 20
transactions per minute.

Does now a normal VACUUM FULL of the whole DB(s) fix our problem?

Michael

On May 11, 11:51am, Tom Lane wrote:
> Subject: Re: [BUGS] Missing tables in postgresql 7.2.4
> "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE> writes:
> > we recently discovered on our production database an a little bit bizarre
> > problem (after two years stable operations). Some tables are simply
missing, or
> > sometimes the affected table(s) is/are there but not listed in pg_tables.
>
> This sounds a bit like a transaction ID wraparound problem. Have you
> been vacuuming your whole database on a reasonable schedule? The
> missing tables might conceivably be old enough that their pg_class rows
> have wrapped around "into the future". It'd be useful to look at
> SELECT datname, age(datfrozenxid) FROM pg_database;
>
> regards, tom lane

--

------------------------------------------------------------------------------
Dipl.-Inform. Michael Beckstette Office: M3-129
AG-PI / Technische Fakultaet EMail:mbeckste(at)techfak(dot)uni-bielefeld(dot)de
Universitaet Bielefeld Fon: +49-521-106-2914
Postfach 100131 Fax: +49-521-106-6411
D-33501 BIELEFELD
Germany


From: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-11 17:55:21
Message-ID: 1050511195521.ZM17433@TechFak.Uni-Bielefeld.DE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

after reading the docs (I know it was a little bit late), I am now relatively
sure that I trapped into a transaction ID wraparound problem. For me its now a
little bit unclear, how to proceed in order to minimize the caused damage.

I checked all tables in the affected DB. Till now, the results are as follows:

6 user tables are completely lost.
8 user tables are not listed in pg_tables but still accessible by a
SELECT.

The 6 completely lost tables are not so dramatical, because they contain only
static data, that I can restore from the development system. But what happens
with the 8 tables that are still accessable, but not listed in pg_tables, after
a VACUUM? Will they be removed completely or 'reinserted' into pg_tables?

Does anyone has an advise how to proceed in this situation?

Regards
Michael

On May 11, 6:11pm, Michael Beckstette wrote:
> Subject: Re: [BUGS] Missing tables in postgresql 7.2.4
> Hi Tom,
>
> this is the output from 'SELECT datname, age(datfrozenxid) FROM pg_database;'
>
> datname | age
> -------------+-------------
> xgc | -1950241750
> dev_db | -1886587214
> template1 | -1884294460
> template0 | -1884294460
> promo_db | -1884294460
> snap_db_new | -1884294460
> gendev_db | 1887538137
> (7 rows)
>
> dev_db=#
>
> The affected DB is 'dev_db', although it looks like the others except
> 'gendev_db' have a wraparound problem too (?). To answer your question about
> the VACUUM: We VACUUM FULL a few tables with a high amount of INSERT/DEL
> operations once per hour, but I have to admit that we perform a VACUUM of the
> whole DB not on a regulary basis. I think the last one was several monthes
ago.
> Further on we use transactions at several places and we have at least 20
> transactions per minute.
>
> Does now a normal VACUUM FULL of the whole DB(s) fix our problem?
>
> Michael
>
>
> On May 11, 11:51am, Tom Lane wrote:
> > Subject: Re: [BUGS] Missing tables in postgresql 7.2.4
> > "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE> writes:
> > > we recently discovered on our production database an a little bit bizarre
> > > problem (after two years stable operations). Some tables are simply
> missing, or
> > > sometimes the affected table(s) is/are there but not listed in pg_tables.
> >
> > This sounds a bit like a transaction ID wraparound problem. Have you
> > been vacuuming your whole database on a reasonable schedule? The
> > missing tables might conceivably be old enough that their pg_class rows
> > have wrapped around "into the future". It'd be useful to look at
> > SELECT datname, age(datfrozenxid) FROM pg_database;
> >
> > regards, tom lane
>
>
>
> --
>
> ------------------------------------------------------------------------------
> Dipl.-Inform. Michael Beckstette Office: M3-129
> AG-PI / Technische Fakultaet
EMail:mbeckste(at)techfak(dot)uni-bielefeld(dot)de
> Universitaet Bielefeld Fon: +49-521-106-2914
> Postfach 100131 Fax: +49-521-106-6411
> D-33501 BIELEFELD
> Germany
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>-- End of excerpt from Michael Beckstette

--

------------------------------------------------------------------------------
Dipl.-Inform. Michael Beckstette Office: M3-129
AG-PI / Technische Fakultaet EMail:mbeckste(at)techfak(dot)uni-bielefeld(dot)de
Universitaet Bielefeld Fon: +49-521-106-2914
Postfach 100131 Fax: +49-521-106-6411
D-33501 BIELEFELD
Germany


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-11 18:39:26
Message-ID: 13800.1115836766@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE> writes:
> The 6 completely lost tables are not so dramatical, because they contain only
> static data, that I can restore from the development system. But what happens
> with the 8 tables that are still accessable, but not listed in pg_tables, after
> a VACUUM? Will they be removed completely or 'reinserted' into pg_tables?

> Does anyone has an advise how to proceed in this situation?

What I would recommend as a first step is to stop the postmaster and
then take a tarball backup of the entire $PGDATA tree. This will at
least provide a chance to go back if subsequent tries mess things up
completely.

After that, I'd try a plain VACUUM (not FULL, not FREEZE) of pg_class
and see if that restores the missing tables to view in pg_tables.
If it does, go ahead and do a database-wide plain VACUUM, and you
should be OK. If it doesn't, we'll need to think of another plan.

regards, tom lane


From: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-12 00:06:42
Message-ID: 1050512020642.ZM21135@TechFak.Uni-Bielefeld.DE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

On May 11, 2:39pm, Tom Lane wrote:

> What I would recommend as a first step is to stop the postmaster and
> then take a tarball backup of the entire $PGDATA tree. This will at
> least provide a chance to go back if subsequent tries mess things up
> completely.

Done. This was probably the biggest tar ball I have ever build (~450GB) ;)

>
> After that, I'd try a plain VACUUM (not FULL, not FREEZE) of pg_class
> and see if that restores the missing tables to view in pg_tables.

Done. The tables that were missing in pg_tables (but accessible with SELECT)
are now listed in pg_tables.

> If it does, go ahead and do a database-wide plain VACUUM, and you
> should be OK.

Done. As far as I can tell, everything is OK again.

Thanks a lot Tom!

P.S.:A TODO for me: CRON Script for weekly VACUUM ;)

Michael

--

------------------------------------------------------------------------------
Dipl.-Inform. Michael Beckstette Office: M3-129
AG-PI / Technische Fakultaet EMail:mbeckste(at)techfak(dot)uni-bielefeld(dot)de
Universitaet Bielefeld Fon: +49-521-106-2914
Postfach 100131 Fax: +49-521-106-6411
D-33501 BIELEFELD
Germany


From: John R Pierce <pierce(at)hogranch(dot)com>
To: Michael Beckstette <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-12 01:57:54
Message-ID: 4282B822.2040505@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> P.S.:A TODO for me: CRON Script for weekly VACUUM ;)

on heavy use databases, mine generally does a light vacuum every 4 hours, and a
once a day full on everything. also, a weekly full reindex on a really really
heavy use systems like this one message board server I ad-mangle

something like...

7 */4 * * * vacuumdb busy_database >/dev/null 2>/dev/null
37 0 * * * vacuumdb -a -z >/dev/null 2>/dev/null
37 2 * * Sun reindexdb -a > /dev/null 2>/dev/null

many might say this is total overkill, I dunno. keeps this server happy. btw,
reindexdb is a script from contrib.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-12 05:05:06
Message-ID: 26962.1115874306@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE> writes:
> On May 11, 2:39pm, Tom Lane wrote:
>> If it does, go ahead and do a database-wide plain VACUUM, and you
>> should be OK.

> Done. As far as I can tell, everything is OK again.

Sweet ;-) In the words of my former business partner, a private pilot
with more hours aloft than many airline captains: "Walked away from
another one ..."

For the benefit of onlookers, the gambit being played here went like
this: the missing pg_class rows must have fairly recently wrapped around
the 2G transaction mark with respect to the current XID counter. That
made them "in the future" not "in the past" as far as normal queries
go. However, a VACUUM will freeze-as-good any tuples that are "in the
past" with respect to the vacuum freeze time, which for a plain VACUUM
is 1G transactions ago. So as long as Michael notices he has a problem
within 1 billion transactions of having a problem, he can get out of it.

I cannot claim that this behavior was operating-as-designed, because
I'm pretty sure we hadn't thought it through when planning the
wraparound XID behavior. But we walked away from another one.

regards, tom lane


From: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-12 13:27:58
Message-ID: 1050512152758.ZM28281@TechFak.Uni-Bielefeld.DE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hello,

two last questions about the transaction ID wraparound problem of my DBs I had
yesterday (see former postings). After applying the 'recovery procedure' Tom
suggested now my XIDs are looking almost fine again, except for the template0
DB.

dev_db=# SELECT datname, age(datfrozenxid) FROM pg_database;

datname | age
-------------+-------------
xgc | 1075434814
dev_db | 1074296718
template1 | 1075486644
template0 | -1882457315
promo_db | 1075528357
snap_db_new | 1075528467
gendev_db | 1075512627

Is this negligible or can it cause any harm in the future? Is there a way to
VACUUM template0 as well?

My second point is more a suggestion. After Toms strategy worked quite well (at
least for me) it is maybe worthwhile to put it somewhere in the docs. Maybe
together with other things in a section called 'Disaster recovery strategies'?

regards
Michael

--

------------------------------------------------------------------------------
Dipl.-Inform. Michael Beckstette Office: M3-129
AG-PI / Technische Fakultaet EMail:mbeckste(at)techfak(dot)uni-bielefeld(dot)de
Universitaet Bielefeld Fon: +49-521-106-2914
Postfach 100131 Fax: +49-521-106-6411
D-33501 BIELEFELD
Germany


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Missing tables in postgresql 7.2.4
Date: 2005-05-12 14:41:25
Message-ID: 543.1115908885@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Michael Beckstette" <mbeckste(at)TechFak(dot)Uni-Bielefeld(dot)DE> writes:
> two last questions about the transaction ID wraparound problem of my DBs I had
> yesterday (see former postings). After applying the 'recovery procedure' Tom
> suggested now my XIDs are looking almost fine again, except for the template0
> DB.

You don't have to worry about template0 -- it was "frozen" during initdb
and does not need vacuuming.

> My second point is more a suggestion. After Toms strategy worked quite
> well (at least for me) it is maybe worthwhile to put it somewhere in
> the docs. Maybe together with other things in a section called
> 'Disaster recovery strategies'?

I'm not sure it can be claimed to be tested well enough to publish as a
recovery strategy. I'm glad it worked for you, but ...

regards, tom lane