Re: Unresolved error 0xC0000409 on Windows Server

Lists: pgsql-hackers
From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-02 18:25:33
Message-ID: CAOcpbfqPLwLq_0BBT4srkbUZZ5kp52_qeeAeqE0UiTh58045_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I am encountering an error on my Postgres installation for Windows Server
64-bit. The error was posted
here<http://archives.postgresql.org/pgsql-bugs/2012-09/msg00014.php>a
couple months ago; however, no solution was found on the pgsql-bugs
list,
so I am reposting it to pgsql-hackers in the hopes that someone will be
able to help. My error message is identical to the one previously posted:

2012-11-01 22:36:26 EDT LOG: 00000: server process (PID 7060) was
terminated by exception 0xC0000409
2012-11-01 22:36:26 EDT DETAIL: Failed process was running: INSERT
INTO *[snipped
SQL command]*
2012-11-01 22:36:26 EDT HINT: See C include file "ntstatus.h" for a
description of the hexadecimal value.
2012-11-01 22:36:26 EDT LOCATION: LogChildExit,
src\backend\postmaster\postmaster.c:2884
2012-11-01 22:36:26 EDT LOG: 00000: terminating any other active server
processes
2012-11-01 22:36:26 EDT LOCATION: HandleChildCrash,
src\backend\postmaster\postmaster.c:2682
2012-11-01 22:36:26 EDT WARNING: 57P00: terminating connection because of
crash of another server process
2012-11-01 22:36:26 EDT DETAIL: The postmaster has commanded this server
process to roll back the current transaction and exit, because another
server process exited abnormally and possibly corrupted shared memory.
2012-11-01 22:36:26 EDT HINT: In a moment you should be able to reconnect
to the database and repeat your command.
2012-11-01 22:36:26 EDT LOCATION: quickdie,
src\backend\tcop\postgres.c:2556
2012-11-01 22:36:26 EDT LOG: 00000: all server processes terminated;
reinitializing
2012-11-01 22:36:26 EDT LOCATION: PostmasterStateMachine,
src\backend\postmaster\postmaster.c:3135
2012-11-01 22:36:36 EDT FATAL: XX000: pre-existing shared memory block is
still in use
2012-11-01 22:36:36 EDT HINT: Check if there are any old server processes
still running, and terminate them.
2012-11-01 22:36:36 EDT LOCATION: PGSharedMemoryCreate,
src\backend\port\win32_shmem.c:194

The error happens regularly while performing database INSERTS. The [snipped
SQL command] part above contains the INSERT command that was executing when
the server crashed. After restarting the server the command executes fine,
so it's not a problem with the command. I installed Postgres from the
standard Windows binary "postgresql-9.2.1-1-windows-x64.exe" and I have not
changed any configuration settings from their default values.

Does anyone know what might be happening and how I might fix it?

Thanks,
Matt


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-02 19:05:47
Message-ID: CAHyXU0yPtVfxc6ZXR5SFd4tgj==KYGMkb4XfwmoOas_QLSJwWw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 2, 2012 at 1:25 PM, Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> wrote:
> Hello,
>
> I am encountering an error on my Postgres installation for Windows Server
> 64-bit. The error was posted here a couple months ago; however, no solution
> was found on the pgsql-bugs list, so I am reposting it to pgsql-hackers in
> the hopes that someone will be able to help. My error message is identical
> to the one previously posted:
>
> 2012-11-01 22:36:26 EDT LOG: 00000: server process (PID 7060) was
> terminated by exception 0xC0000409
> 2012-11-01 22:36:26 EDT DETAIL: Failed process was running: INSERT INTO
> [snipped SQL command]
> 2012-11-01 22:36:26 EDT HINT: See C include file "ntstatus.h" for a
> description of the hexadecimal value.
> 2012-11-01 22:36:26 EDT LOCATION: LogChildExit,
> src\backend\postmaster\postmaster.c:2884
> 2012-11-01 22:36:26 EDT LOG: 00000: terminating any other active server
> processes
> 2012-11-01 22:36:26 EDT LOCATION: HandleChildCrash,
> src\backend\postmaster\postmaster.c:2682
> 2012-11-01 22:36:26 EDT WARNING: 57P00: terminating connection because of
> crash of another server process
> 2012-11-01 22:36:26 EDT DETAIL: The postmaster has commanded this server
> process to roll back the current transaction and exit, because another
> server process exited abnormally and possibly corrupted shared memory.
> 2012-11-01 22:36:26 EDT HINT: In a moment you should be able to reconnect
> to the database and repeat your command.
> 2012-11-01 22:36:26 EDT LOCATION: quickdie,
> src\backend\tcop\postgres.c:2556
> 2012-11-01 22:36:26 EDT LOG: 00000: all server processes terminated;
> reinitializing
> 2012-11-01 22:36:26 EDT LOCATION: PostmasterStateMachine,
> src\backend\postmaster\postmaster.c:3135
> 2012-11-01 22:36:36 EDT FATAL: XX000: pre-existing shared memory block is
> still in use
> 2012-11-01 22:36:36 EDT HINT: Check if there are any old server processes
> still running, and terminate them.
> 2012-11-01 22:36:36 EDT LOCATION: PGSharedMemoryCreate,
> src\backend\port\win32_shmem.c:194
>
> The error happens regularly while performing database INSERTS. The [snipped
> SQL command] part above contains the INSERT command that was executing when
> the server crashed. After restarting the server the command executes fine,
> so it's not a problem with the command. I installed Postgres from the
> standard Windows binary "postgresql-9.2.1-1-windows-x64.exe" and I have not
> changed any configuration settings from their default values.
>
> Does anyone know what might be happening and how I might fix it?

hm, several times over the last couple of months (both on postgres 9.1
and 9.2), i've seen a similar crash, but on linux. It hits the log
like this:

Execution halted (~ 200x)
Error: segfault from C stack overflow
Execution halted (~ 30x)
LOG: server process (PID 19882) was terminated by signal 11: Segmentation fault
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back
the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.

note the lack of LOG in 'Execution halted', etc. This has happened
several times, on different servers using different workloads (but
always under load). As of yet, I've been unable to get a core but I
hope to get one next time it happens. I wonder if it's a similar
cause?

One thing I've been tempted to try is raising max_stack_depth to see
if that helps the problem.

merlin


From: Noah Misch <noah(at)leadboat(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-03 01:00:07
Message-ID: 20121103010007.GB28197@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 02, 2012 at 02:05:47PM -0500, Merlin Moncure wrote:
> On Fri, Nov 2, 2012 at 1:25 PM, Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> wrote:
> > I am encountering an error on my Postgres installation for Windows Server
> > 64-bit. The error was posted here a couple months ago; however, no solution
> > was found on the pgsql-bugs list, so I am reposting it to pgsql-hackers in
> > the hopes that someone will be able to help. My error message is identical
> > to the one previously posted:
> >
> > 2012-11-01 22:36:26 EDT LOG: 00000: server process (PID 7060) was
> > terminated by exception 0xC0000409
> > 2012-11-01 22:36:26 EDT DETAIL: Failed process was running: INSERT INTO
> > [snipped SQL command]

Could you post an anonymized query, post an anonymized query plan, and/or
describe the general nature of the query? Does it call functions? About how
many rows does it insert?

What server settings have you customized?
https://wiki.postgresql.org/wiki/Server_Configuration

If you could get a stack trace or minidump, that would be most helpful:
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows

Magnus's questions for the reporter of bug #7517 are relevant, too. Does the
system have any antivirus software installed?

> > 2012-11-01 22:36:26 EDT LOG: 00000: all server processes terminated;
> > reinitializing
> > 2012-11-01 22:36:26 EDT LOCATION: PostmasterStateMachine,
> > src\backend\postmaster\postmaster.c:3135
> > 2012-11-01 22:36:36 EDT FATAL: XX000: pre-existing shared memory block is
> > still in use
> > 2012-11-01 22:36:36 EDT HINT: Check if there are any old server processes
> > still running, and terminate them.
> > 2012-11-01 22:36:36 EDT LOCATION: PGSharedMemoryCreate,
> > src\backend\port\win32_shmem.c:194

This part smells like a bug in its own right.

> hm, several times over the last couple of months (both on postgres 9.1
> and 9.2), i've seen a similar crash, but on linux. It hits the log
> like this:
>
> Execution halted (~ 200x)
> Error: segfault from C stack overflow
> Execution halted (~ 30x)
> LOG: server process (PID 19882) was terminated by signal 11: Segmentation fault
> LOG: terminating any other active server processes

> note the lack of LOG in 'Execution halted', etc. This has happened
> several times, on different servers using different workloads (but
> always under load). As of yet, I've been unable to get a core but I
> hope to get one next time it happens. I wonder if it's a similar
> cause?

Google suggests those unadorned messages originate in R. Do the affected
systems use PL/R? If so ...

> One thing I've been tempted to try is raising max_stack_depth to see
> if that helps the problem.

... that probably won't help. Depending on the specifics of the situation,
*lowering* max_stack_depth might tend to give you an ERROR instead of a crash.
Or it might just give R a bit more stack space to devour before reaching the
same crash it would otherwise reach.

Thanks,
nm


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-04 00:47:08
Message-ID: CAOcpbfrrchP4XgAkXF3tJYFxQo2zd-41mVCfr9i=57AS-KBbMw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 2, 2012 at 9:00 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:

> On Fri, Nov 02, 2012 at 02:05:47PM -0500, Merlin Moncure wrote:
> > On Fri, Nov 2, 2012 at 1:25 PM, Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
> wrote:
> > > I am encountering an error on my Postgres installation for Windows
> Server
> > > 64-bit. The error was posted here a couple months ago; however, no
> solution
> > > was found on the pgsql-bugs list, so I am reposting it to
> pgsql-hackers in
> > > the hopes that someone will be able to help. My error message is
> identical
> > > to the one previously posted:
> > >
> > > 2012-11-01 22:36:26 EDT LOG: 00000: server process (PID 7060) was
> > > terminated by exception 0xC0000409
> > > 2012-11-01 22:36:26 EDT DETAIL: Failed process was running: INSERT
> INTO
> > > [snipped SQL command]
>
> Could you post an anonymized query, post an anonymized query plan, and/or
> describe the general nature of the query? Does it call functions? About
> how
> many rows does it insert?
>

Here is the command that was executing when the 0xC0000409 exception was
raised:

INSERT INTO places (bounding_box,country,full_name,id,name,type,url)

VALUES

(st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
32.771786,-97.034085 32.771786))',4326),26918),'United States','Irving,
TX','dce44ec49eb788f5','Irving','city','
http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),

(st_transform_null(ST_GeometryFromText('POLYGON((107.610398
-6.9006302,107.610398 -6.864448,107.637222 -6.864448,107.637222
-6.9006302,107.610398 -6.9006302))',4326),26918),'Indonesia','Coblong, Kota
Bandung','2c0294c5eab821c9','Coblong','city','
http://api.twitter.com/1/geo/id/2c0294c5eab821c9.json')

The st_transform_null function is simply a wrapper around the PostGIS
st_transform function that deals with NULL values. The other fields are all
VARCHARs. This insert is only adding two values. In general, the insert
commands I'm running insert anywhere up to 100 rows each, so they're not
huge.

>
> What server settings have you customized?
> https://wiki.postgresql.org/wiki/Server_Configuration
>

I haven't customized any settings.

>
> If you could get a stack trace or minidump, that would be most helpful:
>
> https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Windows
>

So I attached the VS debugger, but the server died without raising an
exception in VS. Not sure what's going on here.

>
> Magnus's questions for the reporter of bug #7517 are relevant, too. Does
> the
> system have any antivirus software installed?
>

Testing disabling the AV software now. Will post back.

Google suggests those unadorned messages originate in R. Do the affected
> systems use PL/R? If so ...
>

Nope.

Really appreciate any help you can provide.

Matt


From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-04 08:39:16
Message-ID: 509629B4.9040200@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/04/2012 08:47 AM, Matthew Gerber wrote:
>
> Here is the command that was executing when the 0xC0000409 exception
> was raised:
>
> INSERT INTO places (bounding_box,country,full_name,id,name,type,url)
>
> VALUES
>
> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
> 32.771786,-97.034085 32.771786))',4326),26918),'United
> States','Irving,
> TX','dce44ec49eb788f5','Irving','city','http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'
> <http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json%27>),

OK, so you're using PostGIS. What other extensions are loaded? PL/R? Any
other PLs?

Can you show the definition of the table `places`, incuding any
associated triggers, etc? Use `\d+` in psql for the table def'n.

Please also post the query plan. http://explain.depesz.com/ is useful
for this.
>
> So I attached the VS debugger, but the server died without raising an
> exception in VS. Not sure what's going on here.
>
>
Try creating a directory called "crashdumps" in the data directory, at
the same level as "pg_xlog" and "pg_clog" etc. Give the "postgresql"
user the "full control" permission on it. Then run the test again.

Do any minidump files appear in the directory? If so, you can examine
them with windbg or Visual Studio to see where the crash happened.

--
Craig Ringer


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-04 18:54:50
Message-ID: CAOcpbfr6-8ttV_ZYNVNK1vTfux-R-n1U_jmxn-ZsOrE4LMgbqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 4, 2012 at 3:39 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:

> On 11/04/2012 08:47 AM, Matthew Gerber wrote:
>
>
> Here is the command that was executing when the 0xC0000409 exception was
> raised:
>
> INSERT INTO places (bounding_box,country,full_name,id,name,type,url)
>
> VALUES
>
> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
> 32.771786,-97.034085 32.771786))',4326),26918),'United States','Irving,
> TX','dce44ec49eb788f5','Irving','city','
> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),
>
>
> OK, so you're using PostGIS. What other extensions are loaded? PL/R? Any
> other PLs?
>

PostGIS is the only extension that I added to the default configuration. I
didn't change anything else.

>
> Can you show the definition of the table `places`, incuding any associated
> triggers, etc? Use `\d+` in psql for the table def'n.
>

Here the definition of the places table:

twitter=# \d+ places
Table "public.places"
Column | Type | Modifiers | Storage | Stats
target |Description
--------------+-------------------------+-----------+----------+--------------+-------------
bounding_box | geometry(Polygon,26918) | | main
| |
country | character varying | | extended
| |
full_name | character varying | | extended
| |
id | character varying | not null | extended
| |
name | character varying | | extended
| |
type | character varying | | extended
| |
url | character varying | | extended
| |

Indexes:
"places_pkey" PRIMARY KEY, btree (id)
"places_bounding_box_idx" gist (bounding_box)
"places_type_idx" btree (type)
Referenced by:
TABLE "tweets" CONSTRAINT "tweets_place_id_fkey" FOREIGN KEY (place_id)
REFERENCES places(id) ON DELETE CASCADE
Has OIDs: no

>
> Please also post the query plan. http://explain.depesz.com/ is useful for
> this.
>

Here is the query plan:

QUERY PLAN

Insert on public.places (cost=0.00..0.01 rows=1 width=0) (actual
time=1.000..1.000 rows=0 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.001..0.001
rows=1 loops=1)
Output:
'01030000202669000001000000050000005E8705A4E32C38C1EE553AE6E95E
4D41086A91990B1B38C11620AF9784874D41FCA5741676E437C19436654287814D41C43E11332BF6
37C17C863746F0584D415E8705A4E32C38C1EE553AE6E95E4D41'::geometry(Polygon,26918),
'United States'::character varying, 'Irving, TX'::character varying,
'dce44ec49e
b788f5'::character varying, 'Irving'::character varying, 'city'::character
varyi
ng, 'http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'::charactervarying
Total runtime: 1.157 ms
(4 rows)

>
>
> So I attached the VS debugger, but the server died without raising an
> exception in VS. Not sure what's going on here.
>
>>
>> Try creating a directory called "crashdumps" in the data directory, at
> the same level as "pg_xlog" and "pg_clog" etc. Give the "postgresql" user
> the "full control" permission on it. Then run the test again.
>

Running it now.

>
> Do any minidump files appear in the directory? If so, you can examine them
> with windbg or Visual Studio to see where the crash happened.
>

Will try it.

Thanks for your help so far, guys. Hopefully we get somewhere on this...

Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-04 19:30:38
Message-ID: 21226.1352057438@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
>> Here is the command that was executing when the 0xC0000409 exception was
>> raised:
>> INSERT INTO places (bounding_box,country,full_name,id,name,type,url)
>> VALUES
>> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
>> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
>> 32.771786,-97.034085 32.771786))',4326),26918),'United States','Irving,
>> TX','dce44ec49eb788f5','Irving','city','
>> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),

Assuming that 0xC0000409 does actually represent a stack-overrun
condition, it doesn't seem like there are very many ways that could
happen on such a simple command. The best theory that is coming to mind
is that you hit some corner case in the GiST code that is causing the
index-entry-insertion attempt to recurse infinitely, or at least enough
times to hit the stack limit. (I exclude the theory of infinite
recursion in the btree indexes only on the grounds that those are so
much better tested than GiST that the idea seems improbable.) It's not
clear yet whether the bug is in the generic GiST code or in the
PostGIS-provided index operator class.

If that is the explanation, then hitting the crash would likely depend
not only on the specific bounding_box value being inserted, but also on
the previous state of the places_bounding_box_idx index, which could
make it darn hard to reproduce. If you can't easily create a reproducer
script, I think we'll have to ask you for a stack trace from the crash.
See
https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend

regards, tom lane


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-05 14:10:30
Message-ID: CAOcpbfpGKz025BpbH57dXMswziRv8m+5HOMiBkb1q=0S2OoEsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 4, 2012 at 3:39 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:

> On 11/04/2012 08:47 AM, Matthew Gerber wrote:
>
>
> So I attached the VS debugger, but the server died without raising an
> exception in VS. Not sure what's going on here.
>
>>
>> Try creating a directory called "crashdumps" in the data directory, at
> the same level as "pg_xlog" and "pg_clog" etc. Give the "postgresql" user
> the "full control" permission on it. Then run the test again.
>
> Do any minidump files appear in the directory? If so, you can examine them
> with windbg or Visual Studio to see where the crash happened.
>

I did this but nothing appears in crashdumps after the server crashes. The
latest test I did included the addition of this directory and the disabling
of my antivirus software. Nothing seems to have changed. Following Tom's
suggestion, I'll try to get a stack trace again (last time didn't produce
anything).

The only other thing I've noticed is that the crash always occurs when
inserting into the "places" table (definition in previous email), even
though there are two other tables that are also receiving inserts. This is
odd to me. Any thoughts?

Matt


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-05 14:15:52
Message-ID: CAHyXU0xNNy67rBxvz7F1hOB4fiTBNOsbFDAYet52faYe2P5pAA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 2, 2012 at 8:00 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> hm, several times over the last couple of months (both on postgres 9.1
>> and 9.2), i've seen a similar crash, but on linux. It hits the log
>> like this:
>>
>> Execution halted (~ 200x)
>> Error: segfault from C stack overflow
>> Execution halted (~ 30x)
>> LOG: server process (PID 19882) was terminated by signal 11: Segmentation fault
>> LOG: terminating any other active server processes
>
>> note the lack of LOG in 'Execution halted', etc. This has happened
>> several times, on different servers using different workloads (but
>> always under load). As of yet, I've been unable to get a core but I
>> hope to get one next time it happens. I wonder if it's a similar
>> cause?
>
> Google suggests those unadorned messages originate in R. Do the affected
> systems use PL/R? If so ...

yes -- they do. I was pretty certain that no R code was running at
the time of the crash but not 100% sure. That's a big clue -- thanks.
Investigating...

merlin


From: Noah Misch <noah(at)leadboat(dot)com>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 05:19:51
Message-ID: 20121111051951.GA1473@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 05, 2012 at 09:10:30AM -0500, Matthew Gerber wrote:
> On Sun, Nov 4, 2012 at 3:39 AM, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> wrote:
> > On 11/04/2012 08:47 AM, Matthew Gerber wrote:
> > So I attached the VS debugger, but the server died without raising an
> > exception in VS. Not sure what's going on here.

Not sure either. I attached WinDbg to a client backend and directed that
backend to call a function written to trigger the same exception. It caught
the exception and reported a credible stack trace. I get the same outcome
using Visual Studio Express 2012 for Windows Desktop.

> >> Try creating a directory called "crashdumps" in the data directory, at
> > the same level as "pg_xlog" and "pg_clog" etc. Give the "postgresql" user
> > the "full control" permission on it. Then run the test again.
> >
> > Do any minidump files appear in the directory? If so, you can examine them
> > with windbg or Visual Studio to see where the crash happened.
>
> I did this but nothing appears in crashdumps after the server crashes. The
> latest test I did included the addition of this directory and the disabling
> of my antivirus software. Nothing seems to have changed. Following Tom's
> suggestion, I'll try to get a stack trace again (last time didn't produce
> anything).

I now see that this exception cannot yield a minidump; the CRT restores the
default handler before raising it. Since this exception is intended to avert
a security exposure, perhaps Microsoft reasoned that allowing application code
to regain control would dilute that benefit. That choice is certainly
inconvenient for us, though.

> The only other thing I've noticed is that the crash always occurs when
> inserting into the "places" table (definition in previous email), even
> though there are two other tables that are also receiving inserts. This is
> odd to me. Any thoughts?

That's not intrinsically surprising unless, say, the tables have the same
structure and receive the same data.

Thanks,
nm


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 05:23:24
Message-ID: 20121111052324.GB1473@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 04, 2012 at 02:30:38PM -0500, Tom Lane wrote:
> Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> >> Here is the command that was executing when the 0xC0000409 exception was
> >> raised:
> >> INSERT INTO places (bounding_box,country,full_name,id,name,type,url)
> >> VALUES
> >> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
> >> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
> >> 32.771786,-97.034085 32.771786))',4326),26918),'United States','Irving,
> >> TX','dce44ec49eb788f5','Irving','city','
> >> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),
>
> Assuming that 0xC0000409 does actually represent a stack-overrun
> condition,

It signifies scribbling past the end of the frame's local variables:
http://msdn.microsoft.com/en-us/library/8dbf701c.aspx


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 15:10:31
Message-ID: CAOcpbfoAceM0FKCRKY=SghRRsWUKnGkVj+9OB55NUMLqmB_-eQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 11, 2012 at 12:23 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:

> On Sun, Nov 04, 2012 at 02:30:38PM -0500, Tom Lane wrote:
> > Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> > >> Here is the command that was executing when the 0xC0000409 exception
> was
> > >> raised:
> > >> INSERT INTO places (bounding_box,country,full_name,id,name,type,url)
> > >> VALUES
> > >> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
> > >> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
> > >> 32.771786,-97.034085 32.771786))',4326),26918),'United
> States','Irving,
> > >> TX','dce44ec49eb788f5','Irving','city','
> > >> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),
> >
> > Assuming that 0xC0000409 does actually represent a stack-overrun
> > condition,
>
> It signifies scribbling past the end of the frame's local variables:
> http://msdn.microsoft.com/en-us/library/8dbf701c.aspx
>

A slight update on this: previously, my insert code involved a long
"SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command. If this
command was too long, I would get a stack depth exception thrown back to my
client application. I reduced the length of the command to eliminate the
client-side exceptions, but on some occasions I would still get the
0xC0000409 crash on the server side. I have eliminated the long "SELECT ...
UNION ALL ... " command, and I now get no errors on either side. So it
seems like long commands like this were actually causing the server-side
crashes. The proper behavior would seem to be throwing the exception back
to the client application instead of crashing the server.

Hope this helps...

Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 16:19:04
Message-ID: 23087.1352650744@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> On Sun, Nov 11, 2012 at 12:23 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> It signifies scribbling past the end of the frame's local variables:
>> http://msdn.microsoft.com/en-us/library/8dbf701c.aspx

> A slight update on this: previously, my insert code involved a long
> "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command.

How long is "long"?

regards, tom lane


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 16:30:32
Message-ID: CAOcpbfoOv+_W6wuhhMtX+Rk1vJ7-Xc2Ov01w5EbgsbqOkq3WLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 11, 2012 at 11:19 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> > On Sun, Nov 11, 2012 at 12:23 AM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> >> It signifies scribbling past the end of the frame's local variables:
> >> http://msdn.microsoft.com/en-us/library/8dbf701c.aspx
>
> > A slight update on this: previously, my insert code involved a long
> > "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command.
>
> How long is "long"?
>

I was seeing queries with around 5000-7000 "UNION ALL" statements.

Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 17:22:24
Message-ID: 24361.1352654544@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> On Sun, Nov 11, 2012 at 11:19 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> How long is "long"?

> I was seeing queries with around 5000-7000 "UNION ALL" statements.

Hm. I experimented with test queries created like so:

perl -e 'print "SELECT 1 a, 2 b, 3 c\n"; print "UNION ALL SELECT 1 a, 2 b, 3 c\n" foreach (1..8200);' | psql

On the machine I tried this on, it works up to about 8200 and then fails
in the way I'd expect:

ERROR: stack depth limit exceeded
HINT: Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate.

But then when I cranked it up to 80000, kaboom:

connection to server was lost

Inspection of the core dump shows transformSetOperationTree is the
problem --- it's recursing but lacks a check_stack_depth test.
So that's easy to fix, but I wonder why the critical depth limit seems
to be so much less on your machine. I get the expected error up to
about 65000 UNION ALLs --- why is yours crashing at a tenth of that?

regards, tom lane


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 18:10:04
Message-ID: CAOcpbfoDAqpNDbLgCHdoMCgzZDGQXq2gys_AJtX3FWv6=3RGiQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 11, 2012 at 12:22 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> > On Sun, Nov 11, 2012 at 11:19 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> How long is "long"?
>
> > I was seeing queries with around 5000-7000 "UNION ALL" statements.
>
> Hm. I experimented with test queries created like so:
>
> perl -e 'print "SELECT 1 a, 2 b, 3 c\n"; print "UNION ALL SELECT 1 a, 2 b,
> 3 c\n" foreach (1..8200);' | psql
>
> On the machine I tried this on, it works up to about 8200 and then fails
> in the way I'd expect:
>
> ERROR: stack depth limit exceeded
> HINT: Increase the configuration parameter "max_stack_depth" (currently
> 2048kB), after ensuring the platform's stack depth limit is adequate.
>
> But then when I cranked it up to 80000, kaboom:
>
> connection to server was lost
>
> Inspection of the core dump shows transformSetOperationTree is the
> problem --- it's recursing but lacks a check_stack_depth test.
> So that's easy to fix, but I wonder why the critical depth limit seems
> to be so much less on your machine. I get the expected error up to
> about 65000 UNION ALLs --- why is yours crashing at a tenth of that?
>

Tom,

Interesting. I really have no idea why mine seemed to fail so much sooner.
I recalled my 5k-7k figure from memory, so I might be off on that, but
probably not by an order of magnitude. In any case, it sounds like you know
how to fix the problem. Should I file this as a bug report or will you take
care of it from here?

Best,
Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 18:22:51
Message-ID: 26195.1352658171@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> Interesting. I really have no idea why mine seemed to fail so much sooner.
> I recalled my 5k-7k figure from memory, so I might be off on that, but
> probably not by an order of magnitude. In any case, it sounds like you know
> how to fix the problem. Should I file this as a bug report or will you take
> care of it from here?

I will fix the crash I'm seeing; I'm just not 100% convinced it's the
same crash you're seeing. It'd be useful if some other folk would poke
at similar examples on Windows to see if there's something unexpected
happening with stack depth checks there.

regards, tom lane


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 19:35:57
Message-ID: 20121111193557.GC5720@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 11, 2012 at 12:22:24PM -0500, Tom Lane wrote:
> perl -e 'print "SELECT 1 a, 2 b, 3 c\n"; print "UNION ALL SELECT 1 a, 2 b, 3 c\n" foreach (1..8200);' | psql
>
> On the machine I tried this on, it works up to about 8200 and then fails
> in the way I'd expect:
>
> ERROR: stack depth limit exceeded
> HINT: Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
>
> But then when I cranked it up to 80000, kaboom:
>
> connection to server was lost

I tried this test case on Windows Server 2008 (x64). It hit max_stack_depth
at 9000 UNIONs and crashed at 10000. When I run it under a debugger, the
debugger reports exception 0xC00000FD (STATUS_STACK_OVERFLOW). Run normally,
the server log reports exception 0xC0000005 (STATUS_ACCESS_VIOLATION).

> Inspection of the core dump shows transformSetOperationTree is the
> problem --- it's recursing but lacks a check_stack_depth test.
> So that's easy to fix, but I wonder why the critical depth limit seems
> to be so much less on your machine. I get the expected error up to
> about 65000 UNION ALLs --- why is yours crashing at a tenth of that?

So, I can reproduce the lower threshold, but the exception type does not agree
with the one Matthew observed.


From: Noah Misch <noah(at)leadboat(dot)com>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 19:43:47
Message-ID: 20121111194347.GD5720@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 11, 2012 at 10:10:31AM -0500, Matthew Gerber wrote:
> > > Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> > > >> Here is the command that was executing when the 0xC0000409 exception
> > was
> > > >> raised:
> > > >> INSERT INTO places (bounding_box,country,full_name,id,name,type,url)
> > > >> VALUES
> > > >> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
> > > >> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
> > > >> 32.771786,-97.034085 32.771786))',4326),26918),'United
> > States','Irving,
> > > >> TX','dce44ec49eb788f5','Irving','city','
> > > >> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),

> A slight update on this: previously, my insert code involved a long
> "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command. If this
> command was too long, I would get a stack depth exception thrown back to my
> client application. I reduced the length of the command to eliminate the
> client-side exceptions, but on some occasions I would still get the
> 0xC0000409 crash on the server side. I have eliminated the long "SELECT ...
> UNION ALL ... " command, and I now get no errors on either side. So it
> seems like long commands like this were actually causing the server-side
> crashes. The proper behavior would seem to be throwing the exception back
> to the client application instead of crashing the server.

Above, you quoted an INSERT ... VALUES of two rows. Have you observed an
exception-0xC0000409 crash with an INSERT ... VALUES query, or only with an
"INSERT ... SELECT ... thousands of UNION" query?


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-11 20:01:51
Message-ID: CAOcpbfrzxaDDWxY0qrssDstYUydCQCSpEKhXi5hQHBRe-jjWtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 11, 2012 at 2:43 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:

> On Sun, Nov 11, 2012 at 10:10:31AM -0500, Matthew Gerber wrote:
> > > > Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com> writes:
> > > > >> Here is the command that was executing when the 0xC0000409
> exception
> > > was
> > > > >> raised:
> > > > >> INSERT INTO places
> (bounding_box,country,full_name,id,name,type,url)
> > > > >> VALUES
> > > > >> (st_transform_null(ST_GeometryFromText('POLYGON((-97.034085
> > > > >> 32.771786,-97.034085 32.953966,-96.888789 32.953966,-96.888789
> > > > >> 32.771786,-97.034085 32.771786))',4326),26918),'United
> > > States','Irving,
> > > > >> TX','dce44ec49eb788f5','Irving','city','
> > > > >> http://api.twitter.com/1/geo/id/dce44ec49eb788f5.json'),
>
> > A slight update on this: previously, my insert code involved a long
> > "SELECT ... UNION ALL ... SELECT ... UNION ALL ..." command. If this
> > command was too long, I would get a stack depth exception thrown back to
> my
> > client application. I reduced the length of the command to eliminate the
> > client-side exceptions, but on some occasions I would still get the
> > 0xC0000409 crash on the server side. I have eliminated the long "SELECT
> ...
> > UNION ALL ... " command, and I now get no errors on either side. So it
> > seems like long commands like this were actually causing the server-side
> > crashes. The proper behavior would seem to be throwing the exception back
> > to the client application instead of crashing the server.
>
> Above, you quoted an INSERT ... VALUES of two rows. Have you observed an
> exception-0xC0000409 crash with an INSERT ... VALUES query, or only with an
> "INSERT ... SELECT ... thousands of UNION" query?
>

Every time the server crashed with 0xC0000409, the log reported that it was
running the simple INSERT command (two rows) that I started this thread
with. However, this didn't make any sense to me given the simplicity of the
INSERT command and the fact that the error indicated a stack overflow. So I
removed the long "SELECT ... UNION ALL ..." command since it seemed more
relevant to the error, and the process has been running continuously for a
few days now.

To answer your question directly: I was seeing the server crash when using
the simple INSERT and long "SELECT ... UNION ..." (these commands are
issued independently at different points in the program). Now my program is
only using the simple INSERT, and the crashes are gone.

Hope this helps...

Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-12 01:27:15
Message-ID: 15685.1352683635@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> So, I can reproduce the lower threshold, but the exception type does not agree
> with the one Matthew observed.

I finally got around to looking at the link you provided about error
0xC0000409, and realized that I'd been completely confusing it with
stack overflow --- but actually, it's a report that something scribbled
past the end of a finite-size local-variable array. So I now think that
Matthew's stumbled across two completely independent bugs, and we've
fixed only one of them. The 0xC0000409 error is something else, and
possibly a lot worse since it could conceivably be a security issue.

It still seems likely that the actual location of the bug is either
in PostGIS or in the GIST index code, but without the ability to
reproduce the failure it's awfully hard to find it. Matthew, could
you try a bit harder to find a self-contained test case that produces
that error?

regards, tom lane


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-11-12 02:45:54
Message-ID: CAOcpbfrRvr54XnzCbOb3Hrb7wF1rT-aL8N=jfYkc1QZgS6ufwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 11, 2012 at 8:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Noah Misch <noah(at)leadboat(dot)com> writes:
> > So, I can reproduce the lower threshold, but the exception type does not
> agree
> > with the one Matthew observed.
>
> I finally got around to looking at the link you provided about error
> 0xC0000409, and realized that I'd been completely confusing it with
> stack overflow --- but actually, it's a report that something scribbled
> past the end of a finite-size local-variable array. So I now think that
> Matthew's stumbled across two completely independent bugs, and we've
> fixed only one of them. The 0xC0000409 error is something else, and
> possibly a lot worse since it could conceivably be a security issue.
>
> It still seems likely that the actual location of the bug is either
> in PostGIS or in the GIST index code, but without the ability to
> reproduce the failure it's awfully hard to find it. Matthew, could
> you try a bit harder to find a self-contained test case that produces
> that error?
>
> regards, tom lane
>

Sure, it might take me a while to find time but I'll keep it on my list.

Matt


From: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-12-09 19:09:21
Message-ID: CAOcpbfoWKo0mvBqvC5oAV2a9-P+exa79=OLOSC0p5NQTS1Fosw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

I have successfully isolated this error and created a simple SQL script to
reproduce it. Just to recap - this script will cause a server crash with
exception 0xC0000409 as described in previous emails. The crux of the
problem seems to be my creation / use of the function st_transform_null. My
intent with this function is to wrap the st_transform function provided by
PostGIS, but account for the situation where the argument to be transformed
is NULL. In this situation, st_transform throws an internal_error, which my
function catches and returns NULL for. The error / crash is not caused by a
NULL argument; rather, it is caused by the final value in the attached
script's INSERT statement, which contains a lat/lon pair that is beyond
PostGIS's range. I'm not questioning whether this value is actually outside
the legal range, but I do not think such an input should cause the server
to crash completely.

Here are the steps to reproduce the crash:

1) Create a new instance of a 9.2 server (Windows 64-bit), and a new
database (call it test) with the PostGIS extension.

2) Run the script:

psql -U postgres -d test -f C:\server_crash.sql

You should see the following:

psql:C:/server_crash.sql:31: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:C:/server_crash.sql:31: connection to server was lost

3) Check your log for the error.

I hope this helps. It took me quite a while to track down the problem so I
hope someone can figure out what is going on under the hood. It seems to be
a pretty significant problem.

Cheers,
Matt

On Sun, Nov 11, 2012 at 9:45 PM, Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>wrote:

>
>
> On Sun, Nov 11, 2012 at 8:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Noah Misch <noah(at)leadboat(dot)com> writes:
>> > So, I can reproduce the lower threshold, but the exception type does
>> not agree
>> > with the one Matthew observed.
>>
>> I finally got around to looking at the link you provided about error
>> 0xC0000409, and realized that I'd been completely confusing it with
>> stack overflow --- but actually, it's a report that something scribbled
>> past the end of a finite-size local-variable array. So I now think that
>> Matthew's stumbled across two completely independent bugs, and we've
>> fixed only one of them. The 0xC0000409 error is something else, and
>> possibly a lot worse since it could conceivably be a security issue.
>>
>> It still seems likely that the actual location of the bug is either
>> in PostGIS or in the GIST index code, but without the ability to
>> reproduce the failure it's awfully hard to find it. Matthew, could
>> you try a bit harder to find a self-contained test case that produces
>> that error?
>>
>> regards, tom lane
>>
>
> Sure, it might take me a while to find time but I'll keep it on my list.
>
> Matt
>
>

Attachment Content-Type Size
server_crash.sql application/octet-stream 1.6 KB

From: Noah Misch <noah(at)leadboat(dot)com>
To: Matthew Gerber <gerber(dot)matthew(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Date: 2012-12-17 02:45:20
Message-ID: 20121217024520.GB5291@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 09, 2012 at 02:09:21PM -0500, Matthew Gerber wrote:
> In this situation, st_transform throws an internal_error, which my
> function catches and returns NULL for. The error / crash is not caused by a
> NULL argument; rather, it is caused by the final value in the attached
> script's INSERT statement, which contains a lat/lon pair that is beyond
> PostGIS's range. I'm not questioning whether this value is actually outside
> the legal range, but I do not think such an input should cause the server
> to crash completely.

The server should not crash, no. However, the facts that PostGIS reported an
internal error and the crash is responsive to your choice of geographic inputs
increases the chance that the problem lies in PostGIS code, not PostgreSQL
core code.

> Here are the steps to reproduce the crash:
>
> 1) Create a new instance of a 9.2 server (Windows 64-bit), and a new
> database (call it test) with the PostGIS extension.
>
> 2) Run the script:
>
> psql -U postgres -d test -f C:\server_crash.sql
>
> You should see the following:
>
> psql:C:/server_crash.sql:31: server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> psql:C:/server_crash.sql:31: connection to server was lost
>
> 3) Check your log for the error.

I tried this test procedure, but I could not reproduce the crash. PostgreSQL:
one click installer postgresql-9.2.2-1-windows-x64.exe; PostGIS: v2.0.1 from
Stack Builder; OS: Windows Server 2008 R2 Datacenter. I needed the workaround
in the last comment of this[1] bug report to get a working installation. At
that point, your test procedure completes without error. What specific
versions are involved in your installation?

Could you try again to get a minidump and stack trace? Connect to your test
database with psql; run "SELECT pg_backend_pid();"; open Visual Studio; select
Debug -> Attach to process...; select the postgres.exe process with matching
ID. Run your test case; when the exception window pops up, select "Break".
If the stack trace does not contain full symbol information, right click on
some of the incomplete lines and select Load Symbols From -> Symbol Path;
navigate to the location of postgres.pdb. You can select "Save Dump As..."
from the Debug menu to create the minidump.

Thanks,
nm

[1] http://trac.osgeo.org/postgis/ticket/1824