ERROR: could not open relation

Lists: pgsql-general
From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: ERROR: could not open relation
Date: 2005-07-14 01:06:55
Message-ID: 6F2B98A0-54B9-4D7B-B091-9941EF00A52A@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I have a production database where we just encountered the following
error:

ERROR: could not open relation 1663/32019395/94144936: No such file
or directory

Here's the output of SELECT version():

PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC 2.95.4

Here's uname -a:

Linux <hostname> 2.6.11.8 #8 SMP Tue Jun 21 11:18:03 CDT 2005 i686
unknown

JFS is the filesystem.

Interestingly, this isn't a FATAL error, but after it occurred, not a
single query was working, and, in fact, all queries seemed to
generate the error. I wasn't present when the error occurred, and by
the time I became available, the box had been rebooted, and
pg_autovacuum, which runs by default, had been started. Otherwise,
everything seems to have come up as expected. I've since killed
pg_autovacuum.

Is there any way to get more information about why this error
occurred and what else I might need to do to recover from it?

I saw this post by Tom Lane in a thread from earlier this year:

http://archives.postgresql.org/pgsql-admin/2005-04/msg00227.php

This makes me ask a possibly unrelated question: what is the 1663
prefix in the relation string? When I examine $PGDATA/base, the
directories within seem to be those that start after the 1663. As in,
I see $PGDATA/base/32019395, not $PGDATA/base/1663/32019395.

Anyway, if I do a lookup by oid for 94144936 in pg_class, I don't see
it. And, clearly, it's not in $PGDATA/base/32019395.

Are the recommendations the same as in the other thread? REINDEX
DATABASE? (What is a "standalone backend"? A single-user version?)
Avoid VACUUMing? pg_dump and reload?

The database is currently running. Should I stop it to prevent
further damage?

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005


From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 02:14:09
Message-ID: 5C38136F-72DC-4DCE-BE1A-11EAE3447848@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I'm developing a habit of being the most frequent replier to my own
posts, but anyway: I discovered the meaning of 1663, which is the
default tablespace oid.

But I still need help with diagnosis and treatment...

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jul 13, 2005, at 8:06 PM, Thomas F. O'Connell wrote:

> I have a production database where we just encountered the
> following error:
>
> ERROR: could not open relation 1663/32019395/94144936: No such
> file or directory
>
> Here's the output of SELECT version():
>
> PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC 2.95.4
>
> Here's uname -a:
>
> Linux <hostname> 2.6.11.8 #8 SMP Tue Jun 21 11:18:03 CDT 2005 i686
> unknown
>
> JFS is the filesystem.
>
> Interestingly, this isn't a FATAL error, but after it occurred, not
> a single query was working, and, in fact, all queries seemed to
> generate the error. I wasn't present when the error occurred, and
> by the time I became available, the box had been rebooted, and
> pg_autovacuum, which runs by default, had been started. Otherwise,
> everything seems to have come up as expected. I've since killed
> pg_autovacuum.
>
> Is there any way to get more information about why this error
> occurred and what else I might need to do to recover from it?
>
> I saw this post by Tom Lane in a thread from earlier this year:
>
> http://archives.postgresql.org/pgsql-admin/2005-04/msg00227.php
>
> This makes me ask a possibly unrelated question: what is the 1663
> prefix in the relation string? When I examine $PGDATA/base, the
> directories within seem to be those that start after the 1663. As
> in, I see $PGDATA/base/32019395, not $PGDATA/base/1663/32019395.
>
> Anyway, if I do a lookup by oid for 94144936 in pg_class, I don't
> see it. And, clearly, it's not in $PGDATA/base/32019395.
>
> Are the recommendations the same as in the other thread? REINDEX
> DATABASE? (What is a "standalone backend"? A single-user version?)
> Avoid VACUUMing? pg_dump and reload?
>
> The database is currently running. Should I stop it to prevent
> further damage?
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
>
> Strategic Open Source: Open Your i™
>
> http://www.sitening.com/
> 110 30th Avenue North, Suite 6
> Nashville, TN 37203-6320
> 615-260-0005
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 12:57:59
Message-ID: 13837.1121345879@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas F. O'Connell" <tfo(at)sitening(dot)com> writes:
> Anyway, if I do a lookup by oid for 94144936 in pg_class, I don't see
> it. And, clearly, it's not in $PGDATA/base/32019395.

You should be looking at relfilenode. See
http://www.postgresql.org/docs/8.0/static/storage.html
and/or use oid2name to figure out what table is being complained of.

regards, tom lane


From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 15:49:56
Message-ID: 4BDDBCBA-46A8-401A-B183-B090A2177827@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

The oid in question does not correspond to a relfilenode, and
oid2name -o 94144936 doesn't return anything when run against the
database in question.

Could this be related to temp tables? We use a lot of them in data
imports, and this was a point of discussion on IRC.

Having a limited understanding of postgres internals, I apologize if
any of the following sound like nonsensically wild hairs:

Does VACUUM (under the auspices of pg_autovacuum) attempt to vacuum
temp tables, which are otherwise not visible outside of a given session?

Does bgwriter operate on temp tables, and could there exist an edge
condition in which bgwriter might have scheduled a write to disk for
a file corresponding to a temp table that was removed by sudden
termination of the session in which the temp table existed such that
the file was removed?

One of the puzzling things to me, for instance, is that this error
persisted, so we're wondering if maybe bgwriter refused to do any
more writing because the thing it was scheduled to write ceased to
exist without telling it.

In the aftermath, a pg_dump (combined with pg_restore for
verification) completed successfully. Do I still have cause for concern?

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jul 14, 2005, at 7:57 AM, Tom Lane wrote:

> "Thomas F. O'Connell" <tfo(at)sitening(dot)com> writes:
>
>> Anyway, if I do a lookup by oid for 94144936 in pg_class, I don't see
>> it. And, clearly, it's not in $PGDATA/base/32019395.
>>
>
> You should be looking at relfilenode. See
> http://www.postgresql.org/docs/8.0/static/storage.html
> and/or use oid2name to figure out what table is being complained of.
>
> regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 16:07:58
Message-ID: 20200.1121357278@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas F. O'Connell" <tfo(at)sitening(dot)com> writes:
> Could this be related to temp tables?

Possibly, given that the table doesn't seem to be there anymore.

> Does bgwriter operate on temp tables, and could there exist an edge
> condition in which bgwriter might have scheduled a write to disk for
> a file corresponding to a temp table that was removed by sudden
> termination of the session in which the temp table existed such that
> the file was removed?

Do you have some evidence that the bgwriter was what was reporting the
error? You didn't say that before.

The bgwriter only works on dirty shared buffers, so the only way this
could be happening is if a page of a temp table had gotten loaded into
a shared buffer, which isn't supposed to happen really. Is it possible
that you had some backend deliberately trying to read a temp table
created by another backend? (You don't have to assume that the
interloper tried to modify the table; a mere SELECT could have created
the dirty-buffer condition due to hint-bit update. You do have to
assume that the interloper was superuser, though, else permissions
would have stopped him from accessing someone else's temp table.)

regards, tom lane


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 16:14:03
Message-ID: 20050714161403.GD19232@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Jul 14, 2005 at 10:49:56AM -0500, Thomas F. O'Connell wrote:

> Does bgwriter operate on temp tables, and could there exist an edge
> condition in which bgwriter might have scheduled a write to disk for
> a file corresponding to a temp table that was removed by sudden
> termination of the session in which the temp table existed such that
> the file was removed?

I suggested that bgwriter may be the culprit, mainly because the log
lines were not preceded by the log_line_prefix as the other lines in the
log. See an extract here: http://rafb.net/paste/results/awxFnY15.html
This may represent a file going away, and a dirty buffer being kept in
memory. How did that happen, I have no clue.

Thomas also mentioned that after the error first appeared, all queries
started failing with the same error message. That does not make any
sense to me; but maybe it could have to do with a corrupt buffer in the
buffer freelist, which every backend tried to write but failed.

I guess the important question to be asking is how did the system get
into that state.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>, PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 16:33:58
Message-ID: 20505.1121358838@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> I suggested that bgwriter may be the culprit, mainly because the log
> lines were not preceded by the log_line_prefix as the other lines in the
> log. See an extract here: http://rafb.net/paste/results/awxFnY15.html

Hmm, what are the logging configuration parameters here exactly?

> Thomas also mentioned that after the error first appeared, all queries
> started failing with the same error message. That does not make any
> sense to me; but maybe it could have to do with a corrupt buffer in the
> buffer freelist, which every backend tried to write but failed.

I have an old note to myself that persistent write errors could "clog"
the bgwriter, because I was worried that after an error it would
stupidly try to write the same buffer again instead of trying to make
progress elsewhere. (CVS tip might be better about this, I'm not sure.)
A dirty buffer for a file that doesn't exist anymore would certainly
qualify as a persistent failure.

> I guess the important question to be asking is how did the system get
> into that state.

Yeah.

regards, tom lane


From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 16:42:40
Message-ID: C2D30B1C-4846-4B5E-A8FF-8A645A61307C@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Sorry, I didn't have the evidence about the bgwriter before. It was
based on conjecture on IRC last night and newly gathered evidence
from this morning.

Here's a list of current postgres processes on the box.

postgres 1186 2.8 5.0 437812 417624 ? S Jul13 22:37
postgres: writer process
postgres 1187 0.3 0.0 5940 2688 ? S Jul13 2:54
postgres: stats buffer process
postgres 1188 3.1 0.1 13456 8856 ? S Jul13 25:16
postgres: stats collector process

My assumption is that it's typically the case that these three
processes generally get allocated sequential pids when postgres starts.

In the postgres log, we see these two types of errors, which were the
only ones that did not report an IP address:

2005-07-12 01:53:31 CDT 13390 :LOG: statistics buffer is full
2005-07-13 17:44:51 CDT 13389 :ERROR: could not open relation
1663/32019395/94144936: No such file or directory

So if we assume that pid 13390 referred to the stats collector from
yesterday, then presumably 13389 was the bgwriter.

Unfortunately, this is a system where the interloper is superuser
(and, yes, changing this has been a TODO). But even so, I need help
understanding how one backend could access the temp table of another.
Which is what brings me to vacuum or some other system process as a
culprit.

Recognizing that the application code will execute as superuser in
postgres, here is what is happening in a session:

Several temporary tables (some of which inherit from actual tables)
are constructed.
Data is loaded in.
If the data includes updates, in the same session, a VACUUM is
performed, else an ANALYZE is performed.

So we know these things:

1. This import process was running.
2. It had started the vacuum, which occurs in the same session as
temp tables that inherit from the table being vacuumed.
3. bgwriter reported an error about a missing relation file (I guess
this is a strong suspicion more than knowledge, but we strongly
suspect).

So could this be somehow related to the fact that VACUUM, as a result
of the inheritance relationship in the temp tables, is explicitly
attempting to access them?

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jul 14, 2005, at 11:07 AM, Tom Lane wrote:

> Do you have some evidence that the bgwriter was what was reporting the
> error? You didn't say that before.
>
> The bgwriter only works on dirty shared buffers, so the only way this
> could be happening is if a page of a temp table had gotten loaded into
> a shared buffer, which isn't supposed to happen really. Is it
> possible
> that you had some backend deliberately trying to read a temp table
> created by another backend? (You don't have to assume that the
> interloper tried to modify the table; a mere SELECT could have created
> the dirty-buffer condition due to hint-bit update. You do have to
> assume that the interloper was superuser, though, else permissions
> would have stopped him from accessing someone else's temp table.)
>
> regards, tom lane


From: "Thomas F(dot) O'Connell" <tfo(at)alumni(dot)brown(dot)edu>
To: PgSQL General <pgsql-general(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 16:59:39
Message-ID: BEB4C1F9-FEDA-4ACF-9459-F7AB460FF60F@alumni.brown.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

One other detail: pg_autovacuum is running on this system.

I just noticed this from Tom's "Autovacuum loose ends" post from
earlier today:

"The code does not make a provision to ignore temporary tables.
Although vacuum.c and analyze.c will disregard the request to touch
such tables, it'd probably be better to recognize the situation further
upstream. In particular it seems that autovacuum will continually throw
ANALYZE requests for a temp table due to lack of stats."

Is this an issue with the current pg_autovacuum implementation? Is it
somehow involved?

Though I feel like we're a little closer to solving this mystery, I
want to target the correct vacuuming process with a fix. I have a
feeling that explicitly dropping the temp tables in the process that
also calls VACUUM prior to the VACUUM is a good short-term fix, but I
also want to know whether continuing to run pg_autovacuum with this
architecture is a bad idea. If so, we can revert to not using temp
tables at all.

Further, why have we only noticed it once when this version of code
(and PostgreSQL) has been running for weeks?

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jul 14, 2005, at 11:42 AM, Thomas F. O'Connell wrote:

> Sorry, I didn't have the evidence about the bgwriter before. It was
> based on conjecture on IRC last night and newly gathered evidence
> from this morning.
>
> Here's a list of current postgres processes on the box.
>
> postgres 1186 2.8 5.0 437812 417624 ? S Jul13 22:37
> postgres: writer process
> postgres 1187 0.3 0.0 5940 2688 ? S Jul13 2:54
> postgres: stats buffer process
> postgres 1188 3.1 0.1 13456 8856 ? S Jul13 25:16
> postgres: stats collector process
>
> My assumption is that it's typically the case that these three
> processes generally get allocated sequential pids when postgres
> starts.
>
> In the postgres log, we see these two types of errors, which were
> the only ones that did not report an IP address:
>
> 2005-07-12 01:53:31 CDT 13390 :LOG: statistics buffer is full
> 2005-07-13 17:44:51 CDT 13389 :ERROR: could not open relation
> 1663/32019395/94144936: No such file or directory
>
> So if we assume that pid 13390 referred to the stats collector from
> yesterday, then presumably 13389 was the bgwriter.
>
> Unfortunately, this is a system where the interloper is superuser
> (and, yes, changing this has been a TODO). But even so, I need help
> understanding how one backend could access the temp table of
> another. Which is what brings me to vacuum or some other system
> process as a culprit.
>
> Recognizing that the application code will execute as superuser in
> postgres, here is what is happening in a session:
>
> Several temporary tables (some of which inherit from actual tables)
> are constructed.
> Data is loaded in.
> If the data includes updates, in the same session, a VACUUM is
> performed, else an ANALYZE is performed.
>
> So we know these things:
>
> 1. This import process was running.
> 2. It had started the vacuum, which occurs in the same session as
> temp tables that inherit from the table being vacuumed.
> 3. bgwriter reported an error about a missing relation file (I
> guess this is a strong suspicion more than knowledge, but we
> strongly suspect).
>
> So could this be somehow related to the fact that VACUUM, as a
> result of the inheritance relationship in the temp tables, is
> explicitly attempting to access them?
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
>
> Strategic Open Source: Open Your i™
>
> http://www.sitening.com/
> 110 30th Avenue North, Suite 6
> Nashville, TN 37203-6320
> 615-260-0005
>
> On Jul 14, 2005, at 11:07 AM, Tom Lane wrote:
>
>
>> Do you have some evidence that the bgwriter was what was reporting
>> the
>> error? You didn't say that before.
>>
>> The bgwriter only works on dirty shared buffers, so the only way this
>> could be happening is if a page of a temp table had gotten loaded
>> into
>> a shared buffer, which isn't supposed to happen really. Is it
>> possible
>> that you had some backend deliberately trying to read a temp table
>> created by another backend? (You don't have to assume that the
>> interloper tried to modify the table; a mere SELECT could have
>> created
>> the dirty-buffer condition due to hint-bit update. You do have to
>> assume that the interloper was superuser, though, else permissions
>> would have stopped him from accessing someone else's temp table.)
>>
>> regards, tom lane
>>
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 17:51:12
Message-ID: 21074.1121363472@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas F. O'Connell" <tfo(at)sitening(dot)com> writes:
> Unfortunately, this is a system where the interloper is superuser
> (and, yes, changing this has been a TODO). But even so, I need help
> understanding how one backend could access the temp table of another.

You'd have to do it pretty explicitly:

select * from pg_temp_NNN.foo ...

but it's certainly possible. I wouldn't expect any application to try
this sort of thing, but if someone was manually poking around on the
box, they might have been tempted to do it.

> Several temporary tables (some of which inherit from actual tables)
> are constructed.

Hmm ... a SELECT from one of the "actual tables" would then scan the
temp tables too, no?

Thinking about this, I seem to recall that we had agreed to make the
planner ignore temp tables of other backends when expanding an
inheritance list --- but I don't see anything in the code implementing
that, so it evidently didn't get done yet.

regards, tom lane


From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 19:53:28
Message-ID: E32BA627-C833-4D89-996F-90BDE4C649A1@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Jul 14, 2005, at 12:51 PM, Tom Lane wrote:

> "Thomas F. O'Connell" <tfo(at)sitening(dot)com> writes:
>
>> Unfortunately, this is a system where the interloper is superuser
>> (and, yes, changing this has been a TODO). But even so, I need help
>> understanding how one backend could access the temp table of another.
>
> You'd have to do it pretty explicitly:
>
> select * from pg_temp_NNN.foo ...
>
> but it's certainly possible. I wouldn't expect any application to try
> this sort of thing, but if someone was manually poking around on the
> box, they might have been tempted to do it.

I can almost guarantee this is not the cause of the problem.

>> Several temporary tables (some of which inherit from actual tables)
>> are constructed.
>
> Hmm ... a SELECT from one of the "actual tables" would then scan the
> temp tables too, no?
>
> Thinking about this, I seem to recall that we had agreed to make the
> planner ignore temp tables of other backends when expanding an
> inheritance list --- but I don't see anything in the code implementing
> that, so it evidently didn't get done yet.

In which case, my guess is that we either need to disconnect the
temporary tables and not use inheritance or revert to a version of
the application that does not use temporary tables at all. Otherwise,
there's a risk of any query on a parent of the temp tables not
restricted by ONLY causing this to occur again, no?

I guess we've been dodging bullets this whole time and were affected
by two issues in postgres simultaneously: 1) bgwriter clogging and 2)
inheritance ignorance of other backend temp tables.

Nice perfect storm, eh? An unrestricted (e.g., non-ONLY) query run
against a _parent_ of a temporary table. Too bad it puts postgres in
an unusable state...

Thanks to everyone for assistance in the sleuthing process.

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005


From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: PgSQL General <pgsql-general(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 21:08:48
Message-ID: 42210BD6-DE37-4B2D-8E12-072BD9908133@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

So my first instinct was to avoid use of temp tables in this scenario
altogether, but now I'm thinking all I might need to do is unhook the
temp tables from inheritance.

But I just want to raise a basic reliability issu raised in the
nearby "Autovacuum loose ends" thread issue before I conclude that
this approach is safe enough to prevent any more bgwriter errors:
does pg_autovacuum as currently written in contrib vacuum temp
tables, and, in 8.0, is this then able (however unlikely) to cause
the sort of error I encountered yesterday? Or was that thread only
talking about the new integrated version of the code as far as access
to temp tables are concerned?

If contrib/pg_autovacuum, temp tables, and bgwriter don't mix well,
I'll need to rethink our vacuum strategy.

Thanks!

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jul 14, 2005, at 2:53 PM, Thomas F. O'Connell wrote:

>>> Several temporary tables (some of which inherit from actual tables)
>>> are constructed.
>>>
>>
>> Hmm ... a SELECT from one of the "actual tables" would then scan the
>> temp tables too, no?
>>
>> Thinking about this, I seem to recall that we had agreed to make the
>> planner ignore temp tables of other backends when expanding an
>> inheritance list --- but I don't see anything in the code
>> implementing
>> that, so it evidently didn't get done yet.
>>
>
> In which case, my guess is that we either need to disconnect the
> temporary tables and not use inheritance or revert to a version of
> the application that does not use temporary tables at all.
> Otherwise, there's a risk of any query on a parent of the temp
> tables not restricted by ONLY causing this to occur again, no?
>
> I guess we've been dodging bullets this whole time and were
> affected by two issues in postgres simultaneously: 1) bgwriter
> clogging and 2) inheritance ignorance of other backend temp tables.
>
> Nice perfect storm, eh? An unrestricted (e.g., non-ONLY) query run
> against a _parent_ of a temporary table. Too bad it puts postgres
> in an unusable state...
>
> Thanks to everyone for assistance in the sleuthing process.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 21:16:56
Message-ID: 4539.1121375816@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Thomas F. O'Connell" <tfo(at)sitening(dot)com> writes:
> does pg_autovacuum as currently written in contrib vacuum temp
> tables, and, in 8.0, is this then able (however unlikely) to cause
> the sort of error I encountered yesterday?

No, and no, and still no for the integrated version. The
isOtherTempNamespace() checks in vacuum.c and analyze.c absolutely
guarantee this, whether or not the autovacuum code itself is bright
enough to exclude temp tables.

regards, tom lane


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 21:23:15
Message-ID: 20050714212314.GL19778@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Jul 14, 2005 at 04:08:48PM -0500, Thomas F. O'Connell wrote:
> So my first instinct was to avoid use of temp tables in this scenario
> altogether, but now I'm thinking all I might need to do is unhook the
> temp tables from inheritance.
>
> But I just want to raise a basic reliability issu raised in the
> nearby "Autovacuum loose ends" thread issue before I conclude that
> this approach is safe enough to prevent any more bgwriter errors:
> does pg_autovacuum as currently written in contrib vacuum temp
> tables, and, in 8.0, is this then able (however unlikely) to cause
> the sort of error I encountered yesterday? Or was that thread only
> talking about the new integrated version of the code as far as access
> to temp tables are concerned?

AFAICS contrib's pg_autovacuum ignores temp tables, so you're safe.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)


From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PgSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: ERROR: could not open relation
Date: 2005-07-14 21:44:19
Message-ID: E5ABD9F7-0B3B-405E-86E9-E7CCF2F9E8BA@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

From this thread, these two bits about PostgreSQL stand out:

"I have an old note to myself that persistent write errors could "clog"
the bgwriter, because I was worried that after an error it would
stupidly try to write the same buffer again instead of trying to make
progress elsewhere. (CVS tip might be better about this, I'm not sure.)
A dirty buffer for a file that doesn't exist anymore would certainly
qualify as a persistent failure."

and

"Hmm ... a SELECT from one of the "actual tables" would then scan the
temp tables too, no?

Thinking about this, I seem to recall that we had agreed to make the
planner ignore temp tables of other backends when expanding an
inheritance list --- but I don't see anything in the code implementing
that, so it evidently didn't get done yet."

I don't immediately see TODO items correpsonding to these. Should
there be some? Or do these qualify as bugs and should they be
submitted to that queue?

Thanks again to all developers and community folk who lent insight
into this error -- diagnosis and recovery (which was, thankfully,
virtually non-existent).

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005


From: "Thomas F(dot) O'Connell" <tfo(at)sitening(dot)com>
To: PgSQL General <pgsql-general(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: ERROR: could not open relation
Date: 2005-07-15 23:38:30
Message-ID: D356BC7F-6CBB-439E-BE54-979E96F8D22C@sitening.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

One final final question: my suspicion is no, but I just want to ask:
this would not affect all inherited tables with bgwriter, would it,
in scenarios where a persistent inherited table gets dropped while a
parent table is being queried? Could this result in a similar
scheduling conflict for bgwriter? I'm under the impression that this
is mostly an issue with the implementation of temp tables and the
planner, but I'd like confirmation from folks who can read the code
more easily...

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Strategic Open Source: Open Your i™

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005

On Jul 14, 2005, at 4:44 PM, Thomas F. O'Connell wrote:

> From this thread, these two bits about PostgreSQL stand out:
>
> "I have an old note to myself that persistent write errors could
> "clog"
> the bgwriter, because I was worried that after an error it would
> stupidly try to write the same buffer again instead of trying to make
> progress elsewhere. (CVS tip might be better about this, I'm not
> sure.)
> A dirty buffer for a file that doesn't exist anymore would certainly
> qualify as a persistent failure."
>
> and
>
> "Hmm ... a SELECT from one of the "actual tables" would then scan the
> temp tables too, no?
>
> Thinking about this, I seem to recall that we had agreed to make the
> planner ignore temp tables of other backends when expanding an
> inheritance list --- but I don't see anything in the code implementing
> that, so it evidently didn't get done yet."
>
> I don't immediately see TODO items correpsonding to these. Should
> there be some? Or do these qualify as bugs and should they be
> submitted to that queue?
>
> Thanks again to all developers and community folk who lent insight
> into this error -- diagnosis and recovery (which was, thankfully,
> virtually non-existent).
>
> --
> Thomas F. O'Connell
> Co-Founder, Information Architect
> Sitening, LLC
>
> Strategic Open Source: Open Your i™
>
> http://www.sitening.com/
> 110 30th Avenue North, Suite 6
> Nashville, TN 37203-6320
> 615-260-0005
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that
> your
> message can get through to the mailing list cleanly
>