BUG #3427: Autovacuum crashed server

Lists: pgsql-bugs
From: "Rainer Bauer" <usenet(at)munnin(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #3427: Autovacuum crashed server
Date: 2007-07-04 09:25:47
Message-ID: 200707040925.l649PlhB003879@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 3427
Logged by: Rainer Bauer
Email address: usenet(at)munnin(dot)com
PostgreSQL version: 8.2.4
Operating system: WinXP SP2
Description: Autovacuum crashed server
Details:

Hello,

My server was running now without any problems since Mid-May: "PostgreSQL
8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2
(mingw-special)"

Today was the first time it crashed (see log file below). Unfortunatelly I
changed log_min_messages to info yesterday, otherwise it would contain more
info.

At the time of the crash there were 4 connections open to the database and
at least one was "probably" doing an update.

I searched the archive and only found these threads, which are not directly
connected to Autovacuum:
<http://archives.postgresql.org/pgsql-bugs/2006-12/msg00163.php>
<http://archives.postgresql.org/pgsql-jdbc/2006-10/msg00109.php>
<http://archives.postgresql.org/pgsql-hackers/2006-10/msg01306.php>

One thing to note is that I have enabled "UseServerSidePrepare" for ODBC
connections last week, because Hiroshi Inoue recommended it:
<http://archives.postgresql.org/pgsql-odbc/2007-06/msg00052.php>
Since the server never crashed before I thought this might have something to
do with the failure.

Rainer

==============================================
These are my autovacuum settings:
autovacuum = on
autovacuum_vacuum_scale_factor = 0.15
autovacuum_analyze_scale_factor = 0.07

==============================================
2007-07-04 09:21:19 LOG: database system was shut down at 2007-07-03
23:45:14 Central Europe Standard Time
2007-07-04 09:21:19 LOG: checkpoint record is at A/31B884C8
2007-07-04 09:21:19 LOG: redo record is at A/31B884C8; undo record is at
0/0; shutdown TRUE
2007-07-04 09:21:19 LOG: next transaction ID: 0/27694933; next OID: 127547
2007-07-04 09:21:19 LOG: next MultiXactId: 1; next MultiXactOffset: 0
2007-07-04 09:21:20 LOG: database system is ready
2007-07-04 10:53:00 LOG: autovacuum process (PID 1712) exited with exit
code -1073741819
2007-07-04 10:53:00 LOG: terminating any other active server processes
2007-07-04 10:53:00 WARNING: terminating connection because of crash of
another server process
2007-07-04 10:53:00 DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2007-07-04 10:53:00 HINT: In a moment you should be able to reconnect to
the database and repeat your command.
2007-07-04 10:53:00 WARNING: terminating connection because of crash of
another server process
2007-07-04 10:53:00 DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2007-07-04 10:53:00 HINT: In a moment you should be able to reconnect to
the database and repeat your command.
2007-07-04 10:53:00 WARNING: terminating connection because of crash of
another server process
2007-07-04 10:53:00 DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2007-07-04 10:53:00 HINT: In a moment you should be able to reconnect to
the database and repeat your command.
2007-07-04 10:53:00 WARNING: terminating connection because of crash of
another server process
2007-07-04 10:53:00 DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2007-07-04 10:53:00 HINT: In a moment you should be able to reconnect to
the database and repeat your command.
2007-07-04 10:53:01 LOG: all server processes terminated; reinitializing
2007-07-04 10:53:01 LOG: database system was interrupted at 2007-07-04
10:53:00 Central Europe Standard Time
2007-07-04 10:53:01 LOG: checkpoint record is at A/3C572C78
2007-07-04 10:53:01 LOG: redo record is at A/3C3AD8F8; undo record is at
0/0; shutdown FALSE
2007-07-04 10:53:01 LOG: next transaction ID: 0/27849484; next OID: 135739
2007-07-04 10:53:01 LOG: next MultiXactId: 1; next MultiXactOffset: 0
2007-07-04 10:53:01 LOG: database system was not properly shut down;
automatic recovery in progress
2007-07-04 10:53:01 LOG: redo starts at A/3C3AD8F8
2007-07-04 10:53:01 LOG: record with zero length at A/3D1F1E60
2007-07-04 10:53:01 LOG: redo done at A/3D1F1E30
2007-07-04 10:53:06 LOG: database system is ready


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Rainer Bauer <usenet(at)munnin(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3427: Autovacuum crashed server
Date: 2007-07-06 15:01:14
Message-ID: 20070706150114.GJ5797@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Rainer Bauer wrote:

> 2007-07-04 09:21:19 LOG: database system was shut down at 2007-07-03
> 23:45:14 Central Europe Standard Time
> 2007-07-04 09:21:19 LOG: checkpoint record is at A/31B884C8
> 2007-07-04 09:21:19 LOG: redo record is at A/31B884C8; undo record is at
> 0/0; shutdown TRUE
> 2007-07-04 09:21:19 LOG: next transaction ID: 0/27694933; next OID: 127547
> 2007-07-04 09:21:19 LOG: next MultiXactId: 1; next MultiXactOffset: 0
> 2007-07-04 09:21:20 LOG: database system is ready
> 2007-07-04 10:53:00 LOG: autovacuum process (PID 1712) exited with exit
> code -1073741819

Is there any way to figure out what "exit code -1073741819" means?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: usenet(at)munnin(dot)com
Cc: Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #3427: Autovacuum crashed server
Date: 2007-07-06 15:31:33
Message-ID: 20070706153133.GE5782@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


spam(at)munnin(dot)com wrote:
> Hello Alvaro,
>
> sorry to responding directly, but I haven't subscribed to the bugs
> mailing list.

Please CC the list anyway, so that
1. it is classified correctly on my end, and
2. so that somebody approves it and it ends up being posted to the list
eventually.

Note: I changed the reply address to the original one you used.
spam(at)munnin(dot)com sounds suspicious.

> AH> Is there any way to figure out what "exit code -1073741819" means?
>
> Yes, -1073741819 is 0xc0000005 aka "Access Violation".

Oh, OK. How can this happen? I don't know much about Windows. Maybe
you have some antivirus is checking the file at the same time Autovac is
checking it?

Also, it would be good to have a more complete report of what happenned.
Can you reproduce this problem? If so, please provide the whole ERROR
report (change log_min_messages I guess). Did it fail trying to open a
file?

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar
al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en
La Feria de las Tinieblas, R. Bradbury)


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: usenet(at)munnin(dot)com, Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #3427: Autovacuum crashed server
Date: 2007-07-06 16:45:54
Message-ID: 468E71C2.40903@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera wrote:
>> AH> Is there any way to figure out what "exit code -1073741819" means?
>>
>> Yes, -1073741819 is 0xc0000005 aka "Access Violation".
>
> Oh, OK. How can this happen? I don't know much about Windows. Maybe
> you have some antivirus is checking the file at the same time Autovac is
> checking it?

Note that Windows can (and often will) give you Access Violation when
you try to access any memory outside of your address-space (since that's
technically an access violation), so it can be whatever would cause
segmentation fault on unix.

//Magnus


From: Rainer Bauer <usenet(at)munnin(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3427: Autovacuum crashed server
Date: 2007-07-06 19:55:10
Message-ID: ck6t831rfsekpp97bvan6s5l25su54tjhg@4ax.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hello Alvaro,

>> AH> Is there any way to figure out what "exit code -1073741819" means?
>>
>> Yes, -1073741819 is 0xc0000005 aka "Access Violation".
>
>Oh, OK. How can this happen? I don't know much about Windows.

Like Magnus already wrote: accessing memory outside your adress space gives
you an Access Violation on Windows. However, most of the time this means a
NULL pointer was dereferenced.

>Maybe you have some antivirus is checking the file at the same time Autovac is checking it?

Yes this could be possible. The postgres server is running on my developer
machine together with an Antivirus software. But like I said this machine was
running now since May without any problems (i.e. Postgres never crashed).

>Also, it would be good to have a more complete report of what happenned.
>Can you reproduce this problem?

No. The server and the 2 programs accessing the server were happily running
for several weeks without any problem.

>If so, please provide the whole ERROR report (change log_min_messages I guess).
>Did it fail trying to open a file?

What I can do is adjust log_min_messages and report back if Autovacuum should
ever crash again. What value do you want it to be?
And should I also adjust log_min_error_statement or any other setting (I have
enough space left on the log partition, so there shouldn't be a problem).

Rainer