Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000

Lists: pgsql-bugs
From: "Jean-Pierre Pelletier" <pelletier_32(at)sympatico(dot)ca>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
Date: 2005-10-05 18:58:35
Message-ID: BAYC1-PASMTP04309BC5E20E4767AF591195820@CEZ.ICE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I'll recompile with the trace that's no problem,
and install the patched release tonight.

After your last email, I've excluded the postgreSQL
directory from the antivirus because I could do it without
rebooting.

I was also sometimes getting read/write or open
error Invalid argument without the server crashing.
After two days, if I haven't seen any of these
error messages there is a very high chance that it's
been fixed by turning off the antivirus.

Jean-Pierre Pelletier

----- Original Message -----
From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: <pgsql-bugs(at)postgresql(dot)org>
Sent: Wednesday, October 05, 2005 5:16 PM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2,
Windows 2000

>
> ""Jean-Pierre Pelletier"" <pelletier_32(at)sympatico(dot)ca> wrote in message
> news:003801c5c9b0$03e08500$6401a8c0(at)JP(dot)(dot)(dot)
>>
>> Yes, there is an antivirus software on the machine, a reboot is needed
>> when it's turned off,
>> I'll be allowed to reboot it tonight or I'll do it sooner if it crashes
>> before that.
>>
>> There are around 15 connections to PostgreSQL when it crashes but most
>> are idle
>> there may be a few inserts but no bulk inserts, the biggest load would
>> come from
>> select statements.
>>
>
> We haven't identified that the failed read/write are caused by anti-virus
> software or intensive read/write. If you can compile the source, can you
> patch smgrread()/smgrwrite() like this to capture the native windows
> error:
>
> void
> smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool
> isTemp)
> {
> if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer,
> isTemp))
> ereport(ERROR,
> (errcode_for_file_access(),
> errmsg("could not write block %u of relation %u/%u/%u:%d: %m",
> blocknum,
> reln->smgr_rnode.spcNode,
> reln->smgr_rnode.dbNode,
> reln->smgr_rnode.relNode,
> GetLastError())));
> }
>
> Regards,
> Qingqing
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
Date: 2005-10-05 21:16:22
Message-ID: di15a5$1ca1$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


""Jean-Pierre Pelletier"" <pelletier_32(at)sympatico(dot)ca> wrote in message
news:003801c5c9b0$03e08500$6401a8c0(at)JP(dot)(dot)(dot)
>
> Yes, there is an antivirus software on the machine, a reboot is needed
> when it's turned off,
> I'll be allowed to reboot it tonight or I'll do it sooner if it crashes
> before that.
>
> There are around 15 connections to PostgreSQL when it crashes but most are
> idle
> there may be a few inserts but no bulk inserts, the biggest load would
> come from
> select statements.
>

We haven't identified that the failed read/write are caused by anti-virus
software or intensive read/write. If you can compile the source, can you
patch smgrread()/smgrwrite() like this to capture the native windows error:

void
smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool
isTemp)
{
if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer,
isTemp))
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not write block %u of relation %u/%u/%u:%d: %m",
blocknum,
reln->smgr_rnode.spcNode,
reln->smgr_rnode.dbNode,
reln->smgr_rnode.relNode,
GetLastError())));
}

Regards,
Qingqing


From: "Jean-Pierre Pelletier" <pelletier_32(at)sympatico(dot)ca>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
Date: 2005-10-07 15:19:25
Message-ID: BAYC1-PASMTP05C64DEF01BBD7502CD94B95840@CEZ.ICE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Turning off the antivirus fixed the problem.
We haven't have any read/write/open error in more
than two days.

Thank you very much for your help and keep up the good work.

Our only remaining PostgreSQL problem is with pg_stat_actitivity
being unreliable and the statistics collector being restarted many times
every day.

Any idea what might be causing that?

Jean-Pierre Pelletier

----- Original Message -----
From: "Jean-Pierre Pelletier" <pelletier_32(at)sympatico(dot)ca>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: <pgsql-bugs(at)postgresql(dot)org>
Sent: Wednesday, October 05, 2005 2:58 PM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2,
Windows 2000

> I'll recompile with the trace that's no problem,
> and install the patched release tonight.
>
> After your last email, I've excluded the postgreSQL
> directory from the antivirus because I could do it without
> rebooting.
>
> I was also sometimes getting read/write or open
> error Invalid argument without the server crashing.
> After two days, if I haven't seen any of these
> error messages there is a very high chance that it's
> been fixed by turning off the antivirus.
>
> Jean-Pierre Pelletier
>
> ----- Original Message -----
> From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
> To: <pgsql-bugs(at)postgresql(dot)org>
> Sent: Wednesday, October 05, 2005 5:16 PM
> Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1
> beta2,
> Windows 2000
>
>
>>
>> ""Jean-Pierre Pelletier"" <pelletier_32(at)sympatico(dot)ca> wrote in message
>> news:003801c5c9b0$03e08500$6401a8c0(at)JP(dot)(dot)(dot)
>>>
>>> Yes, there is an antivirus software on the machine, a reboot is needed
>>> when it's turned off,
>>> I'll be allowed to reboot it tonight or I'll do it sooner if it crashes
>>> before that.
>>>
>>> There are around 15 connections to PostgreSQL when it crashes but most
>>> are idle
>>> there may be a few inserts but no bulk inserts, the biggest load would
>>> come from
>>> select statements.
>>>
>>
>> We haven't identified that the failed read/write are caused by anti-virus
>> software or intensive read/write. If you can compile the source, can you
>> patch smgrread()/smgrwrite() like this to capture the native windows
>> error:
>>
>> void
>> smgrwrite(SMgrRelation reln, BlockNumber blocknum, char *buffer, bool
>> isTemp)
>> {
>> if (!(*(smgrsw[reln->smgr_which].smgr_write)) (reln, blocknum, buffer,
>> isTemp))
>> ereport(ERROR,
>> (errcode_for_file_access(),
>> errmsg("could not write block %u of relation %u/%u/%u:%d: %m",
>> blocknum,
>> reln->smgr_rnode.spcNode,
>> reln->smgr_rnode.dbNode,
>> reln->smgr_rnode.relNode,
>> GetLastError())));
>> }
>>
>> Regards,
>> Qingqing
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Jean-Pierre Pelletier <pelletier_32(at)sympatico(dot)ca>
Cc: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
Date: 2005-10-07 15:24:17
Message-ID: 20051007152417.GC8765@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Oct 07, 2005 at 11:19:25AM -0400, Jean-Pierre Pelletier wrote:

> Our only remaining PostgreSQL problem is with pg_stat_actitivity
> being unreliable and the statistics collector being restarted many times
> every day.

The stats collector (which mantains pg_stat_activity among other things)
uses an UDP socket to receive info from the backends, so if UDP
communication is crippled, it's going to be unreliable. Maybe there are
too many lost packets. I don't know what could cause it to die though
-- certainly not lost packets. (The postmaster restarts it
automatically if it detects it's not running.)

--
Alvaro Herrera http://www.advogato.org/person/alvherre
"Everybody understands Mickey Mouse. Few understand Hermann Hesse.
Hardly anybody understands Einstein. And nobody understands Emperor Norton."


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
Date: 2005-10-07 19:08:15
Message-ID: di66hd$2qba$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


""Jean-Pierre Pelletier"" <pelletier_32(at)sympatico(dot)ca> wrote
> Turning off the antivirus fixed the problem.
> We haven't have any read/write/open error in more
> than two days.
>
> Thank you very much for your help and keep up the good work.
>

You are welcome :-) But I still suspect if this really solves the problem
... by the way, may I know what anti-virus software are you using? And, if
it is possible, can you please turn on the anti-virus software again and
check the GetLastError()?

A more detailed "guess" of the problem is here:
http://archives.postgresql.org/pgsql-hackers/2005-07/msg00489.php

Thanks a lot,
Qingqing


From: "Jean-Pierre Pelletier" <pelletier_32(at)sympatico(dot)ca>
To: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: Possibly corrupted shared memory, PostgreSQL 8.1 beta2, Windows 2000
Date: 2005-10-11 12:36:16
Message-ID: BAYC1-PASMTP02F16CFF2DEF64C02F80DB95780@CEZ.ICE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The antivirus is CA eTrust EZ v 7.0.6.7.

I cannot put back the antivirus on that server
because it is now in production mode.

Jean-Pierre Pelletier

----- Original Message -----
From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: <pgsql-bugs(at)postgresql(dot)org>
Sent: Friday, October 07, 2005 3:08 PM
Subject: Re: [BUGS] Possibly corrupted shared memory, PostgreSQL 8.1 beta2,
Windows 2000

>
> ""Jean-Pierre Pelletier"" <pelletier_32(at)sympatico(dot)ca> wrote
>> Turning off the antivirus fixed the problem.
>> We haven't have any read/write/open error in more
>> than two days.
>>
>> Thank you very much for your help and keep up the good work.
>>
>
> You are welcome :-) But I still suspect if this really solves the problem
> ... by the way, may I know what anti-virus software are you using? And, if
> it is possible, can you please turn on the anti-virus software again and
> check the GetLastError()?
>
> A more detailed "guess" of the problem is here:
> http://archives.postgresql.org/pgsql-hackers/2005-07/msg00489.php
>
> Thanks a lot,
> Qingqing
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly