Backend often crashing

From: gnotari(at)linkgroup(dot)it
To: Postgresql General <pgsql-general(at)postgresql(dot)org>
Subject: Backend often crashing
Date: 2003-04-02 18:35:07
Message-ID: OFBC159CD4.B139A001-ONC1256CFC.006617BD-C1256CFC.006617DB@linkgroup.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

<P>I'm resending the message as a reminder.</P><P>I'm adding my findings at the bottom.</P><P>&nbsp;</P><P>&gt;<FONT FACE="Courier">I have one of those nasty problems, with Postgres backend often crashing<BR>&gt;with signal 11.<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">I'll do my best to give you the details:<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">Postgres is 7.2.1, more exactly is Debian package 7.2.1-2 from the Stable<BR>&gt;(Woody) distribution -- I'm forwarding copy of this message to Debian's<BR>&gt;package mantainer.<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">Postgres is running as a backend for a well known italian web site, running<BR>&gt;on Zope (version 2.6.1 with psycopg Python adapter, v.1.1)<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">The problem is recent, i.e. never happened until last month or so, on this<BR>&gt;same setup.<BR>&gt;I have a few other machines, running the same software setup, but different<BR>&gt;Zope sites, never experiencing any problem.<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">These are the relevant lines from syslog<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">Feb 20 14:43:53 speed postgres[13365]: [25] DEBUG: &nbsp;server process (pid<BR>&gt;15906) was terminated by signal 11<BR>&gt;Feb 20 14:43:53 speed postgres[13365]: [26] DEBUG: &nbsp;terminating any other<BR>&gt;active server processes<BR>&gt;Feb 20 14:43:53 speed postgres[15908]: [26-1] NOTICE: &nbsp;Message from<BR>&gt;PostgreSQL backend:<BR>&gt;Feb 20 14:43:53 speed postgres[15908]: [26-2] ^IThe Postmaster has informed<BR>&gt;me that some other backend<BR>&gt;Feb 20 14:43:53 speed postgres[15908]: [26-3] ^Idied abnormally and<BR>&gt;possibly corrupted shared memory.<BR>&gt;Feb 20 14:43:53 speed postgres[15908]: [26-4] ^II have rolled back the<BR>&gt;current transaction and am<BR>&gt;Feb 20 14:43:53 speed postgres[15908]: [26-5] ^Igoing to terminate your<BR>&gt;database system connection and exit.<BR>&gt;Feb 20 14:43:53 speed postgres[15908]: [26-6] ^IPlease reconnect to the<BR>&gt;database system and repeat your query.<BR>&gt;Feb 20 14:43:53 speed postgres[15904]: [26-1] NOTICE: &nbsp;Message from<BR>&gt;PostgreSQL backend:<BR>&gt;Feb 20 14:43:53 speed postgres[15904]: [26-2] ^IThe Postmaster has informed<BR>&gt;me that some other backend<BR>&gt;Feb 20 14:43:53 speed postgres[15904]: [26-3] ^Idied abnormally and<BR>&gt;possibly corrupted shared memory.<BR>&gt;Feb 20 14:43:53 speed postgres[15904]: [26-4] ^II have rolled back the<BR>&gt;current transaction and am<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">I immediately thought of an hardware problem but, having an equivalent<BR>&gt;machine online, I dumped the db and moved to that.<BR>&gt;The problem manifestated at once on the other machine, which had previously<BR>&gt;(~1 month before) &nbsp;run the site without any error.<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">The two machines have the same software setup, but different Linux kernels<BR>&gt;(2.4.19 vs 2.4.20, reiserfs vs ext3), and different hardware.<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">I cannot reproduce the problem reliably, though on the production machine<BR>&gt;the database crashes many times an hour.<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">It _seems_ to be related to some mildly convoluted query (a SELECT only<BR>&gt;query). Running that query manually, I managed to crash the backend only<BR>&gt;once.<BR>&gt;VACUUM FULL never gave any error, nor did pg_dump.<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">I obtained some (pretty large, ~90MB) core files from the crashes. The<BR>&gt;backtrace is consistent between the files, here it is:<BR>&gt;</FONT><BR>&gt;<FONT FACE="Courier">#0 &nbsp;0x08157e92 in MemoryContextReset ()<BR>&gt;#1 &nbsp;0x08157eb9 in MemoryContextResetChildren ()<BR>&gt;#2 &nbsp;0x08157e8b in MemoryContextReset ()<BR>&gt;#3 &nbsp;0x08157eb9 in MemoryContextResetChildren ()<BR>&gt;#4 &nbsp;0x08157e8b in MemoryContextReset ()<BR>&gt;#5 &nbsp;0x080c5c88 in ExecScan ()<BR>&gt;#6 &nbsp;0x080cb61a in ExecSeqScan ()<BR>&gt;#7 &nbsp;0x080c4139 in ExecProcNode ()<BR>&gt;#8 &nbsp;0x080cbe2c in ExecSort ()<BR>&gt;#9 &nbsp;0x080c41c9 in ExecProcNode ()<BR>&gt;#10 0x080ca630 in ExecMergeJoin ()<BR>&gt;#11 0x080c4189 in ExecProcNode ()<BR>&gt;#12 0x080cbe2c in ExecSort ()<BR>&gt;#13 0x080c41c9 in ExecProcNode ()<BR>&gt;#14 0x080cc0ae in ExecUnique ()<BR>&gt;#15 0x080c41d9 in ExecProcNode ()<BR>&gt;#16 0x080cd5d5 in ExecReScanSetParamPlan ()<BR>&gt;#17 0x080c5cac in ExecScan ()<BR>&gt;#18 0x080cd5f6 in ExecSubqueryScan ()<BR>&gt;#19 0x080c4169 in ExecProcNode ()<BR>&gt;#20 0x080c73f8 in ExecProcAppend ()<BR>&gt;#21 0x080c4129 in ExecProcNode ()<BR>&gt;#22 0x080cbe2c in ExecSort ()<BR>&gt;#23 0x080c41c9 in ExecProcNode ()<BR>&gt;#24 0x080cb9a6 in ExecSetOp ()<BR>&gt;#25 0x080c41e9 in ExecProcNode ()<BR>&gt;#26 0x080cbe2c in ExecSort ()<BR>&gt;#27 0x080c41c9 in ExecProcNode ()<BR>&gt;#28 0x080c30fe in ExecutorEnd ()<BR>&gt;#29 0x080c2797 in ExecutorRun ()<BR>&gt;#30 0x081104de in ProcessQuery ()<BR>&gt;#31 0x0810ed70 in pg_exec_query_string ()<BR>&gt;#32 0x0810fd5e in PostgresMain ()<BR>&gt;#33 0x080f6d4e in ClosePostmasterPorts ()<BR>&gt;#34 0x080f669f in ClosePostmasterPorts ()<BR>&gt;#35 0x080f5882 in PostmasterMain ()<BR>&gt;#36 0x080f5391 in PostmasterMain ()<BR>&gt;#37 0x080d4e18 in main ()<BR>&gt;#38 0x401d114f in __libc_start_main () from /lib/libc.so.6</FONT></P><P>&nbsp;</P><P><FONT FACE="Courier">As it turned out, switching to 7.2.4 gave no result. The errors are still there.</FONT></P><P><FONT FACE="Courier">But, now, at least I've a clue. It seemed that the error was triggered almost exclusively by a search funcion on the web site. </FONT></P><P><FONT FACE="Courier">The code turned out to call extensively the to_ascii() function of Postgres. &nbsp;I have reason to suspect that the database contains, in text fields, characters which do not pertain to the selected encoding (LATIN1).</FONT></P><P><FONT FACE="Courier">So, I fancied, one possible culprit was the to_ascii function chocking on some strange character.</FONT></P><P><FONT FACE="Courier">I replaced the occurences of to_ascii with a custum function that calls to_ascii only on the result of a translate, which in turn converts some strange (russian?) characters to plain ascii.</FONT></P><P><FONT FACE="Courier">The errors dropped down, the few remaining don't seem to be related to that search function.</FONT></P><P><FONT FACE="Courier">Of course, this is not conclusive, I've yet to reproduce reliably the error on a single, selected data row, but I think what i found it's worth reporting.</FONT></P><P>&nbsp;</P><P><FONT FACE="Courier">Thanks to the developing team!</FONT></P><P>&nbsp;</P><P><FONT FACE="Courier">ciao</FONT></P><P><FONT FACE="Courier">Guido</FONT></P><P></P>

Attachment Content-Type Size
unknown_filename text/html 6.6 KB

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Choe 2003-04-02 18:59:16 Re: stored procedure
Previous Message Vlad Krupin 2003-04-02 18:22:52 'DROP INDEX' kills stored rpocedures