Bug #671: server corrupt

Lists: pgsql-bugs
From: pgsql-bugs(at)postgresql(dot)org
To: pgsql-bugs(at)postgresql(dot)org
Subject: Bug #671: server corrupt
Date: 2002-05-22 12:41:17
Message-ID: 20020522124117.D192D475A95@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Dmitry Riachtchentsev (diamondrain(at)mail(dot)ru) reports a bug with a severity of 1
The lower the number the more severe it is.

Short Description
server corrupt

Long Description
The follow short sequence of SQL requests leads to server corrupt:
1. Run psql and type:
begin;
CREATE SEQUENCE B;
commit;
\q

2. After exiting run psql again and type:
begin;
SELECT nextval('B');
DROP SEQUENCE B;
CREATE SEQUENCE B1;
commit;

> Here I get: "NOTICE: LockRelease: no such lock"
> Then type:

begin;
CREATE SEQUENCE B2;

> Here I get messages:
---------------------------------------------------------------
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
Failed.
---------------------------------------------------------------

Then until restarting the server, connection to the server can not be established.
The database log is:
---------------------------------------------------------------
DEBUG: database system was shut down at 2002-05-21 11:40:09 EDT
DEBUG: checkpoint record is at 0/276B98
DEBUG: redo record is at 0/276B98; undo record is at 0/0; shutdown TRUE
DEBUG: next transaction id: 1421; next oid: 24991
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

NOTICE: LockRelease: no such lock
DEBUG: server process (pid 10918) was terminated by signal 10
DEBUG: terminating any other active server processes

NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
NOTICE: Message from PostgreSQL backend:
The Postmaster has informed me that some other backend
died abnormally and possibly corrupted shared memory.
I have rolled back the current transaction and am
going to terminate your database system connection and exit.
Please reconnect to the database system and repeat your query.
DEBUG: all server processes terminated; reinitializing shared memory and semaphores
DEBUG: database system was interrupted at 2002-05-21 11:40:46 EDT
DEBUG: checkpoint record is at 0/276B98
DEBUG: redo record is at 0/276B98; undo record is at 0/0; shutdown TRUE
DEBUG: next transaction id: 1421; next oid: 24991
DEBUG: database system was not properly shut down; automatic recovery in progress
DEBUG: redo starts at 0/276BD8
DEBUG: ReadRecord: record with zero length at 0/28F640
DEBUG: redo done at 0/28F618
DEBUG: database system is ready
---------------------------------------------------------------

PostgreSQL version is 7.2.1
Operation system is SunOS 5.8

Is my sequence of requests legal?
Is there a patch that fixs this problem?
If there is no patch, what is the root of the problem? Is there a kit of rules to avoid this situation?

I am ready to provide you any more required information to reproduce and trace this situation. Please e-mail me.

Thanks
Dmitry

Sample Code

No file was uploaded with this report


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: diamondrain(at)mail(dot)ru, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Bug #671: server corrupt
Date: 2002-05-22 14:12:30
Message-ID: 23899.1022076750@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

pgsql-bugs(at)postgresql(dot)org writes:
> The follow short sequence of SQL requests leads to server corrupt:

Looks like a bug to me too. I can still replicate the "NOTICE:
LockRelease: no such lock" in current sources, but it seems that
someone's already fixed the core dump; CVS tip does not crash.
Will look into it more...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: diamondrain(at)mail(dot)ru, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Bug #671: server corrupt
Date: 2002-05-22 16:08:42
Message-ID: 6246.1022083722@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

pgsql-bugs(at)postgresql(dot)org writes:
> Is there a patch that fixs this problem?
> If there is no patch, what is the root of the problem? Is there a kit of rules to avoid this situation?

After more detailed investigation, I find both the notice and the
subsequent crash have the same cause: after nextval(), sequence.c is
holding a reference to an open relation cache entry for the sequence,
which it intends to close at end of transaction. When you drop the
sequence later in the same transaction, the cache entry goes away,
leaving sequence.c with a dangling pointer. Its attempt to close the
cache entry at commit not only causes the NOTICE complaint from the lock
manager, but also modifies memory that may by now have been assigned to
something else. The crash in your example was due to another cache
entry having been corrupted in this way.

A proper fix will require revising the way sequence.c holds its state;
that strikes me as too large a change to risk back-patching into 7.2.*
--- especially for a bug that has gone unnoticed for many years. AFAIK
sequence.c has always been implemented like this, and thus has always
had this problem.

I will fix it for 7.3, but in the meantime I suggest not dropping a
sequence that you've nextval'd, currval'd, or setval'd earlier in the
same transaction. As a stopgap defense I will probably back-patch
something to make it error out if you try.

regards, tom lane