Re: ERROR: type "temp_gc" already exists

Lists: pgsql-general
From: Damon Hart <dhcom(at)sundial(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: type "temp_gc" already exists
Date: 2005-09-27 18:08:01
Message-ID: 43398A81.4020805@sundial.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all -

I came across this thread after intermittently observing very similar
error messages last week from a PostgreSQL (8.0.2) server, e.g.:

SQL error: = 'type "local_roll" already exists' in line 1984

(only the portion in quotes is generated by the server, the context is
from the client application.) I followed up with the suggested action,
including testing of the hardware involved, on the original and a backup
server without clearing up the issue. It recurs and I am at a loss for
what I might do to further diagnose/isolate/fix the problem.

There were several identical errors, referencing both tables and
indexes, but always objects in a temporary schema and always with a
client app which creates and drops many temp tables and indexes.
Following Tom Lane's suggestion, a check of pg_depend never reveals the
presence of the affected name, using a very simple query like 'SELECT
objid, refobjid FROM pg_depend' which should not use the index
(confirmed by EXPLAIN) and simply grepping through the output for the
target values. After confirming the absence of a pg_depend entry, I ran
queries "DROP TYPE pg_temp_NNN.local_roll;" for each pg_temp_NNN schema
and similarly for the other affected types. On one schema the DROP TYPE
would succeed (not found, as expected, in the rest.) However, the
problem would recur with the same or a different name, so the incorrect
entries were being regenerated somehow.

Trying to get handle on this, I did a dump/restore to initialize the
system tables/indexes to a coherent state (I'm not a PG expert, but
'pg_dump <database> | grep pg_depend' fails, so I presume the restore
rebuilds the system tables.) From this starting point, the errors still
occurred. I ran several filesystem and smartmontools checks of the disks
involved and some memory tests for good measure without any indication
of a problem.

I restored the same database image to a backup server. I did not
immediately get the same errors on the backup server (like I said, it's
intermittent) but in the log file on a restart of the backup server
after < 24 hours usage I see:

WARNING: index "pg_depend_depender_index" contains 6195 row versions,
but table contains 6192 row versions
HINT: Rebuild the index with REINDEX.
WARNING: index "pg_depend_reference_index" contains 6195 row versions,
but table contains 6192 row versions
HINT: Rebuild the index with REINDEX.
WARNING: index "pg_depend_depender_index" contains 6195 row versions,
but table contains 6192 row versions
HINT: Rebuild the index with REINDEX.
WARNING: index "pg_depend_reference_index" contains 6195 row versions,
but table contains 6192 row versions
HINT: Rebuild the index with REINDEX.
WARNING: index "pg_attribute_relid_attnam_index" contains 4604 row
versions, but table contains 4594 row versions
HINT: Rebuild the index with REINDEX.
WARNING: index "pg_attribute_relid_attnum_index" contains 4604 row
versions, but table contains 4594 row versions
HINT: Rebuild the index with REINDEX.
WARNING: index "pg_attribute_relid_attnam_index" contains 4604 row
versions, but table contains 4594 row versions
HINT: Rebuild the index with REINDEX.
WARNING: index "pg_attribute_relid_attnum_index" contains 4604 row
versions, but table contains 4594 row versions
HINT: Rebuild the index with REINDEX.

This seems at least tangentially related, based on Tom's expectation of
a corrupt pg_depend index.

Sorry I haven't been able to diagnose this any more precisely, but I
think a hardware explanation is very doubtful. I'd be happy to run
tests or provide further details of the application usage, server
environment, etc. - I'm just not sure what's relevant to the issue iat
hand. I'd be even happier if someone could reasonably state that this
didn't threaten the integrity of my stored data . . .

regards,

Damon Hart

>Date: Mon, 12 Sep 2005 15:43:15 -0400
>From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>To: Janning Vygen <vygen(at)gmx(dot)de>
>Cc: pgsql-general(at)postgresql(dot)org
>Subject: Re: ERROR: type "temp_gc" already exists
>Message-ID: <24056(dot)1126554195(at)sss(dot)pgh(dot)pa(dot)us>
>
>Janning Vygen <vygen(at)gmx(dot)de> writes:
>
>
>>> Am Samstag, 10. September 2005 18:05 schrieb Tom Lane:
>>
>>
>>>>> If there's no pg_depend entry then DROP TYPE should work. Otherwise
>>>>> you might have to resort to manually DELETEing the pg_type row.
>>>
>>>
>
>
>
>>> Thanks for your detailed answer. I don't want to do anything wrong. To be
>>> sure, i have some more questions:
>>
>>
>
>
>
>>> - There is no entry in pg_depend. Should i just drop the entry from pg_type or
>>> should i REINDEX anyway?
>>
>>
>
>Well, what did you do to check that there was no entry? If the index is
>corrupt and you issued a query that used the index, it might have failed
>to find an entry that's actually there in the table (in fact, if we're
>assuming the DROP TYPE didn't happen because the system didn't find the
>dependency row while dropping the table, this is pretty much exactly
>what you'd expect). I'd REINDEX and then check again.
>
>
>
>>> - How can things like this happen? Hardware failure? If yes, should i change
>>> my harddisk?
>>
>>
>
>Insufficient information to say. It wouldn't be a bad idea to run some
>disk tests though.
>
>
>
>>> [1] It's not clear to me if pg_depend is a "shared system catalog" because the
>>> docs say
>>> "any of the shared system catalogs (pg_database,
>>> pg_group, pg_shadow, or pg_tablespace)"
>>> Maybe the iteration is final, maybe it shows only examples)
>>
>>
>
>That's meant to be a complete list --- I've updated the documentation to
>make this clearer. But you could check for yourself:
> select relname from pg_class where relisshared;
>
> regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Damon Hart <dhcom(at)sundial(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: ERROR: type "temp_gc" already exists
Date: 2005-09-27 18:36:49
Message-ID: 9440.1127846209@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Damon Hart <dhcom(at)sundial(dot)com> writes:
> I came across this thread after intermittently observing very similar
> error messages last week from a PostgreSQL (8.0.2) server, e.g.:
> SQL error: = 'type "local_roll" already exists' in line 1984
> ...
> Sorry I haven't been able to diagnose this any more precisely, but I
> think a hardware explanation is very doubtful.

I agree, it smells more like a software problem. Can you generate a
test case, ie, a self-contained program someone could run that'd
reproduce the error (possibly after running a long time)?

regards, tom lane


From: Janning Vygen <vygen(at)gmx(dot)de>
To: pgsql-general(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Damon Hart <dhcom(at)sundial(dot)com>
Subject: Re: ERROR: type "temp_gc" already exists
Date: 2005-09-28 09:50:53
Message-ID: 200509281150.54215.vygen@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi Tom, Hi Damon,

Am Dienstag, 27. September 2005 20:36 schrieb Tom Lane:
> Damon Hart <dhcom(at)sundial(dot)com> writes:
> > I came across this thread after intermittently observing very similar
> > error messages last week from a PostgreSQL (8.0.2) server, e.g.:
> > SQL error: = 'type "local_roll" already exists' in line 1984
> > ...
> > Sorry I haven't been able to diagnose this any more precisely, but I
> > think a hardware explanation is very doubtful.
>
> I agree, it smells more like a software problem. Can you generate a
> test case, ie, a self-contained program someone could run that'd
> reproduce the error (possibly after running a long time)?

I recently reported this problem and i would like to help solving it. But how
can i build a self-contained test-case? It just happens sometimes under load.
do you just need DB Schema and my frontend script which causes the error?

Or do you need a complete dump of the database? Does it make sense to generate
a dump of the database when the error occured?

Hardware can't be the reason in my opionion because it happens on two servers
at the same time under the same load. The servers are not related to each
other in anyway but both run exactly the same application. They run the same
hardware but it woul dsurprise me if two hardware failures happen at the same
time and produce the same software failure.

I got the error in two cases:

case 1:
a daemon calls a plpgsql function in Transaction mode "SERIALIZATION". This
function generates a TEMP TABLE for calculating some stuff like this:

EXECUTE $$
CREATE TEMP TABLE temp_gc AS
SELECT
[...]

And at the end of the function the temp table is dropped with
EXECUTE $$DROP TABLE temp_gc;$$;

This function is only called by a daemon which calculates some materialized
view. This function is never called by more than one session at the same
time.

It fails under load (load comes from test case 2) with
"TYPE temp_gc already exists"

case 2:
Many parallel sessions do the following:

BEGIN;
CREATE TEMP TABLE spiele (
sp_id int4,
sp_heimtore int4,
sp_gasttore int4,
sp_abpfiff boolean,
wb_name text,
sn_name text,
sp_termin timestamp
) ON COMMIT DROP;
INSERT INTO spiele ...
SELECT ...
COMMIT;

there is also a table "Spiele" in schema public. The session selects a result
which depends on the user given game results (table "spiele" saves game
results).

Under load it happens that the transaction fails with
TYPE spiele already exists.

In both cases i got 10-15 of type temp_gc and spiele in pg_type.

Then i run something like
for I in 1..20 do
DROP TYPE pg_temp_$I.spiele;
DROP TYPE pg_temp_$I.temp_gc;
done;

After this everything works fine again.

kind regards,
Janning Vygen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Janning Vygen <vygen(at)gmx(dot)de>
Cc: pgsql-general(at)postgresql(dot)org, Damon Hart <dhcom(at)sundial(dot)com>
Subject: Re: ERROR: type "temp_gc" already exists
Date: 2005-09-28 14:07:25
Message-ID: 5987.1127916445@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Janning Vygen <vygen(at)gmx(dot)de> writes:
> I recently reported this problem and i would like to help solving it. But how
> can i build a self-contained test-case? It just happens sometimes under load.

I didn't say you need to make it 100% reproducible; you just have to
make a case that someone else can run that will eventually produce the
error. The sort of poking and prying that will need to happen to debug
it will involve things you do not want done to your production database,
therefore we need to be able to make the error happen in a test setup.

You probably need to create a client script that will issue multiple
parallel queries that are similar to what your regular application does.
See for instance this discussion:
http://archives.postgresql.org/pgsql-hackers/2005-05/msg00613.php
If you're handy with C, pgbench might be a useful starting point.
But a script in perl python or tcl will be fine too.

regards, tom lane


From: Janning Vygen <vygen(at)gmx(dot)de>
To: pgsql-general(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Damon Hart <dhcom(at)sundial(dot)com>
Subject: Re: ERROR: type "temp_gc" already exists
Date: 2005-10-18 07:45:06
Message-ID: 200510180945.06689.vygen@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I tried to reproduce it, but it seems that my problem vanished since i
switched from pg_pconnect to pg_connect in PHP. Maybe this is of any help.
But in my understanding the reported failure should not be influenced by
selection of pg_connect vs. pg_pconnect.

i will report if this problem arises again.

kind regards,
janning

Am Mittwoch, 28. September 2005 16:07 schrieb Tom Lane:
> Janning Vygen <vygen(at)gmx(dot)de> writes:
> > I recently reported this problem and i would like to help solving it. But
> > how can i build a self-contained test-case? It just happens sometimes
> > under load.
>
> I didn't say you need to make it 100% reproducible; you just have to
> make a case that someone else can run that will eventually produce the
> error. The sort of poking and prying that will need to happen to debug
> it will involve things you do not want done to your production database,
> therefore we need to be able to make the error happen in a test setup.
>
> You probably need to create a client script that will issue multiple
> parallel queries that are similar to what your regular application does.
> See for instance this discussion:
> http://archives.postgresql.org/pgsql-hackers/2005-05/msg00613.php
> If you're handy with C, pgbench might be a useful starting point.
> But a script in perl python or tcl will be fine too.
>
> regards, tom lane