Re: BUG #4941: pg_stat_statements crash

Lists: pgsql-bugspgsql-hackers
From: "" <alr(dot)nospamforme(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4941: pg_stat_statements crash
Date: 2009-07-24 13:14:40
Message-ID: 200907241314.n6ODEeMm078336@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


The following bug has been logged online:

Bug reference: 4941
Logged by:
Email address: alr(dot)nospamforme(at)gmail(dot)com
PostgreSQL version: 8.4.0
Operating system: windows 2008,2003
Description: pg_stat_statements crash
Details:

i was testing for issues with reattach memory patch fix
I believe that in windows 2008 , and 2003 shared_preload_libraries =
'pg_stat_statements' can crash postgres

2009-07-23 17:48:22 EDT 3820 LOG: server process (PID 4608) was
terminated by exception 0xC0000005
2009-07-23 17:48:22 EDT 3820 HINT: See C include file "ntstatus.h" for a
description of the hexadecimal value.
2009-07-23 17:48:22 EDT 3820 LOG: terminating any other active server
processes

so far have been able to reproduce this by running pgbench on mutable
systems. ( two non production and 2 clean installs on MS virtual Server )
machines have 3+ gig ram
shared_buffers = 94MB
max_connections = 300

pgbench.exe -i -s 300 pgbench
and running over and over
pgbench -c 100 -C from windows xp client
please not it may not crash on the first or second run but when it crashes
the first time
it will then crash every time after that even with reboots. it may be that
the 'pg_stat_statements' log somehow gets corrupted?
i am unable to reproduce if -C is not used ?
i have also noticed that normally task manager splits the load between a
lot of postgres processes but when it is turned on 1 postgres process uses
50% load.
I have tested both the postgres 8.4.0.9202 and 8.4.9177 postgress.exe and
both seem to have same issue .
hope this info helps
Allen


From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "" <alr(dot)nospamforme(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4941: pg_stat_statements crash
Date: 2009-07-27 02:11:19
Message-ID: 20090727103830.E3CA.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

"" <alr(dot)nospamforme(at)gmail(dot)com> wrote:
> Bug reference: 4941
> PostgreSQL version: 8.4.0
> Operating system: windows 2008,2003
> Description: pg_stat_statements crash

> crash every time after that even with reboots.

I researched the issue, and found pgss_shmem_startup() is called
for each connection establishment. Then, shared structure might
corrupt because we read dumpfile into memory without any locks.
The problem seems to come from EXEC_BACKEND; shmem_startup_hook is
called only once on POSIX platforms, but many times on Windows.

We should call [Read dumpfile] routine only once even on Windows.
How about executing the routine during AddinShmemInitLock is taken?

The best solution might be to call shmem_startup_hook only once
every platforms, but it is difficult without fork().

[8.4.0]
pgss_shmem_startup(void)
{
LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
pgss = ShmemInitStruct("pg_stat_statements" &found);
if (!found)
{
[Initialize shared memory];
}
LWLockRelease(AddinShmemInitLock);
[Read dumpfile];
}

[To be]
pgss_shmem_startup(void)
{
LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
pgss = ShmemInitStruct("pg_stat_statements" &found);
if (!found)
{
[Initialize shared memory];
[Read dumpfile];
}
LWLockRelease(AddinShmemInitLock);
}

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "" <alr(dot)nospamforme(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4941: pg_stat_statements crash
Date: 2009-07-27 02:28:18
Message-ID: 21301.1248661698@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> We should call [Read dumpfile] routine only once even on Windows.

Seems to me that you should simply do the load only when found is false.

> How about executing the routine during AddinShmemInitLock is taken?

You should not hold AddinShmemInitLock any longer than really necessary.
I don't think there is a need for locking here anyway, though, since
found should be false only in the postmaster.

> The best solution might be to call shmem_startup_hook only once
> every platforms, but it is difficult without fork().

We're not changing that. This is a bug in pgss_shmem_startup.

regards, tom lane


From: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Cc: "" <alr(dot)nospamforme(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4941: pg_stat_statements crash
Date: 2009-07-27 02:54:59
Message-ID: 20090727114035.E3D2.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> > We should call [Read dumpfile] routine only once even on Windows.
> Seems to me that you should simply do the load only when found is false.

Here is a patch to fix pg_stat_statements on Windows.

I see we don't need any locks because initialization is done in postmaster;
There are no chance to see uninitialized state of 'pgss' after relasing
AddinShmemInitLock and before load dumpfile into it.

I also check pgss_shmem_shutdown and no problem.
It is called only once from postmaster on shutdown.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
pg_stat_statements.patch application/octet-stream 518 bytes

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, alr(dot)nospamforme(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4941: pg_stat_statements crash
Date: 2009-07-27 03:30:42
Message-ID: 20090727033042.GG10327@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Itagaki Takahiro escribió:
>
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > > We should call [Read dumpfile] routine only once even on Windows.
> > Seems to me that you should simply do the load only when found is false.
>
> Here is a patch to fix pg_stat_statements on Windows.

Hmm, it seems the comment just above the patched line needs to be fixed.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, "" <alr(dot)nospamforme(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4941: pg_stat_statements crash
Date: 2009-07-27 03:35:46
Message-ID: 22731.1248665746@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> Here is a patch to fix pg_stat_statements on Windows.

Yeah, that looks about right to me. Committed.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, alr(dot)nospamforme(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4941: pg_stat_statements crash
Date: 2009-07-27 03:38:18
Message-ID: 22805.1248665898@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Itagaki Takahiro escribi:
>> Here is a patch to fix pg_stat_statements on Windows.

> Hmm, it seems the comment just above the patched line needs to be fixed.

I looked at that and decided it was OK as-is. How do you want to
change it?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, alr(dot)nospamforme(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4941: pg_stat_statements crash
Date: 2009-07-27 03:48:58
Message-ID: 20090727034858.GI10327@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane escribió:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > Itagaki Takahiro escribi:
> >> Here is a patch to fix pg_stat_statements on Windows.
>
> > Hmm, it seems the comment just above the patched line needs to be fixed.
>
> I looked at that and decided it was OK as-is. How do you want to
> change it?

The reason that it doesn't need locks is not that there's no other
process running, but that it was already initialized, in the case when
found is false.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org, alr(dot)nospamforme(at)gmail(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4941: pg_stat_statements crash
Date: 2009-07-27 03:58:59
Message-ID: 23090.1248667139@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane escribi:
>> I looked at that and decided it was OK as-is. How do you want to
>> change it?

> The reason that it doesn't need locks is not that there's no other
> process running, but that it was already initialized, in the case when
> found is false.

Mph. The comment is correct, I think, but it applies to the situation
after we pass the !found test, rather than where the comment is. Maybe
we should just move it down one statement?

regards, tom lane