Re: [GENERAL] cache lookup of relation 165058647 failed

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Sean Chittenden <sean(at)chittenden(dot)org>
Cc: PostgreSQL Bugs List <pgsql-bugs(at)postgresql(dot)org>, Juris Krumins <juriskr(at)komin(dot)lv>
Subject: Re: [GENERAL] cache lookup of relation 165058647 failed
Date: 2004-05-06 03:30:11
Message-ID: 4099B143.8020006@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-general

Sean Chittenden wrote:

>>>> I'v find out that this error occurs in:
>>>> dependency.c file
>>>>
>>>> 2004-04-26 11:09:34 ERROR: dependency.c 1621: cache lookup of
>>>> relation
>>>> 149064743 failed
>>>> 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist
>>>> 2004-04-26 11:09:34 ERROR: Relation "tmp_table1" does not exist
>>>>
>>>> in getRelationDescription(StringInfo buffer, Oid relid) function.
>>>>
>>>> Any ideas what can cause this errors.
>>> <aol>Me too.</aol>
>>> But, I am suspecting that it's a race condition with the new
>>> background writer code. I've started testing a new database design
>>> and was able to reproduce this on my laptop nearly 90% of the time,
>>> but could only reproduce it about 10% of the time on my production
>>> databases until I figured out what the difference was, fsync.
>>
>> temp tables don't use the shared buffer cache, how can this be related
>> to the BG writer?
>
> Don't the system catalogs use the shared buffer cache?
>
> BEGIN;
> SELECT create_temp_table_func(); -- Inserts a row into pg_class via
> CREATE TEMP TABLE
> -- Do other stuff
> COMMIT; -- After the commit, the row is now visible to other
> backends
> -- disconnect -- If the delay between the disconnect and reconnect is
> small enough
> -- reconnect -- It's as though there is a race condition that allows
> the function
> -- pg_table_is_visible() to assert the "cache lookup of relation"
> -- error.
> BEGIN;
> SELECT create_temp_table_func(); -- Before the CREATE TEMP TABLE, I
> call
> /* SELECT TRUE FROM pg_catalog.pg_class c
> LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
> WHERE c.relname = ''footmp''::TEXT AND
> c.relkind = ''r''::TEXT AND
> pg_catalog.pg_table_is_visible(c.oid); */
> -- But the query fails
>
> My guess was that the series of events went something like:
>
> proc 0) COMMIT's and the row in pg_class is committed
> proc 1) bgwriter writer code removes a page for the cache
> proc 2) queries for the page [*]
> proc 1) writes it to disk
> proc 2) queries for the page [*]
> proc 1) sync's the fd
>
> [*] proc 2 queries for the page at either of these points
>
> In 7.4, there is no bgwriter or background process mucking with cache,

Except for the checkpoint process, which does exactly the same as the
bgwriter does, and ALL concurrent backends whenever they feel the need
to evict a dirty buffer.

If it makes a difference if a pg_class page is dirty in the buffer or
copied out to disk with respect to visibility rules of the tuples
contained in it, then the whole thing is a way larger bug than the one
in MIB. First of all, committed or not, a temp object from one session
should NEVER be visible in any other.

Jan

> which is why this works 100% of the time. In 7.5, however, there's a
> 200ms gap where a race condition appears and pg_table_is_visible()
> fails its PointerIsValid() check. If I put a sleep in, the sleep gives
> the bgwriter enough time to commit the pages to disk so that the
> queries for the page happen after the fd's been sync()'ed.
>
> I have no other clue as to why this would be happening though, so
> believe me when I say, I could very well be quite wrong.... but this is
> my best, quasi-educated/grep(1)'ed guess.
>
> -sc
>

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sean Chittenden 2004-05-06 05:50:50 Re: [GENERAL] cache lookup of relation 165058647 failed
Previous Message Sean Chittenden 2004-05-05 20:40:54 Re: [GENERAL] cache lookup of relation 165058647 failed

Browse pgsql-general by date

  From Date Subject
Next Message Ed L. 2004-05-06 04:42:38 select for update & lock contention
Previous Message Bruce Momjian 2004-05-06 03:13:15 Re: error code with psql