Re: FATAL: could not reattach to shared memory (Win32)

Lists: pgsql-general
From: Shelby Cain <alyandon(at)yahoo(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-23 21:03:07
Message-ID: 586496.2879.qm@web55408.mail.re4.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>----- Original Message ----
>From: Magnus Hagander <magnus(at)hagander(dot)net>
>To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
>Cc: Terry Yapt <yapt(at)technovell(dot)com>; pgsql-general(at)postgresql(dot)org
>Sent: Thursday, August 23, 2007 3:43:32 PM
>Subject: Re: [GENERAL] FATAL: could not reattach to shared memory (Win32)
>
>
>8.3 will have a new way to deal with shared mem on win32. It's the same
>underlying tech, but we're no longer trying to squeeze it into an
>emulation of sysv. With a bit of luck, that'll help :-)
>
>//Magnus
>

Wild guess on my part... could that error be the result of an attempt to map shared memory into a process at a fixed location that just happens to already be occupied by a dll that Windows had decided to relocate?

Regards,

Shelby Cain


____________________________________________________________________________________
Pinpoint customers who are looking for what you sell.
http://searchmarketing.yahoo.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Shelby Cain <alyandon(at)yahoo(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 06:08:44
Message-ID: 46CE75EC.3060202@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Shelby Cain wrote:
>> ----- Original Message ---- From: Magnus Hagander
>> <magnus(at)hagander(dot)net> To: Alvaro Herrera
>> <alvherre(at)commandprompt(dot)com> Cc: Terry Yapt <yapt(at)technovell(dot)com>;
>> pgsql-general(at)postgresql(dot)org Sent: Thursday, August 23, 2007
>> 3:43:32 PM Subject: Re: [GENERAL] FATAL: could not reattach to
>> shared memory (Win32)
>>
>>
>> 8.3 will have a new way to deal with shared mem on win32. It's the
>> same underlying tech, but we're no longer trying to squeeze it into
>> an emulation of sysv. With a bit of luck, that'll help :-)
>>
>> //Magnus
>>
>
> Wild guess on my part... could that error be the result of an attempt
> to map shared memory into a process at a fixed location that just
> happens to already be occupied by a dll that Windows had decided to
> relocate?

Not that wild a guess, really :-) I'd say it's a very good possibility -
but I have no idea why it'd do that, since all backends load the same
DLLs at that stage.

//Magnus


From: "Trevor Talbot" <quension(at)gmail(dot)com>
To: "Magnus Hagander" <magnus(at)hagander(dot)net>
Cc: "Shelby Cain" <alyandon(at)yahoo(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Terry Yapt" <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 12:49:43
Message-ID: 90bce5730708240549y6b42c8a1u4bda5a1892a20e3a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 8/23/07, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> Shelby Cain wrote:

> > Wild guess on my part... could that error be the result of an attempt
> > to map shared memory into a process at a fixed location that just
> > happens to already be occupied by a dll that Windows had decided to
> > relocate?
>
> Not that wild a guess, really :-) I'd say it's a very good possibility -
> but I have no idea why it'd do that, since all backends load the same
> DLLs at that stage.

Not a valid assumption; you can't rely on consistent VM space among
multiple [non-cloned] processes without a serious amount of effort.
Anything can use that space, it's not just file views. Obviously it
happens to work some of the time, but when it doesn't, it doesn't. I
gather postgres depends on it being at the same address, and fixing
that isn't trivial?

If everything relevant is going through the intriguing
internal_forkexec(), you could probably reserve address space there
before resuming the thread. You'd want to combine this with picking
address space that's less likely to be used before creating the shared
memory section. (Actually, if you're doing that, you might as well
just inject the backend variables too instead of going through the
mapped file gymnastics.)

Not a simple change, but would likely make this particular problem go
away (assuming this is the problem). It's also the first time I've
looked at the source, so perhaps I missed something.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Trevor Talbot" <quension(at)gmail(dot)com>
Cc: "Magnus Hagander" <magnus(at)hagander(dot)net>, "Shelby Cain" <alyandon(at)yahoo(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Terry Yapt" <yapt(at)technovell(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 14:02:32
Message-ID: 87ir751153.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Trevor Talbot" <quension(at)gmail(dot)com> writes:

> I gather postgres depends on it being at the same address, and fixing that
> isn't trivial?

I haven't been following the rest of the thread so I'm not sure if this is
important. But no, fixing that should be relatively trivial as there are
already some configurations where it's not the case (the EXEC_BACKEND case I
believe). The rest of the system uses a shared memory base pointer and
references everything relative to that.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Trevor Talbot <quension(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 15:06:27
Message-ID: 200708241506.l7OF6R700259@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Trevor Talbot wrote:
> On 8/23/07, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> > Shelby Cain wrote:
>
> > > Wild guess on my part... could that error be the result of an attempt
> > > to map shared memory into a process at a fixed location that just
> > > happens to already be occupied by a dll that Windows had decided to
> > > relocate?
> >
> > Not that wild a guess, really :-) I'd say it's a very good possibility -
> > but I have no idea why it'd do that, since all backends load the same
> > DLLs at that stage.
>
> Not a valid assumption; you can't rely on consistent VM space among
> multiple [non-cloned] processes without a serious amount of effort.
> Anything can use that space, it's not just file views. Obviously it
> happens to work some of the time, but when it doesn't, it doesn't. I
> gather postgres depends on it being at the same address, and fixing
> that isn't trivial?
>
> If everything relevant is going through the intriguing
> internal_forkexec(), you could probably reserve address space there
> before resuming the thread. You'd want to combine this with picking
> address space that's less likely to be used before creating the shared
> memory section. (Actually, if you're doing that, you might as well
> just inject the backend variables too instead of going through the
> mapped file gymnastics.)
>
> Not a simple change, but would likely make this particular problem go
> away (assuming this is the problem). It's also the first time I've
> looked at the source, so perhaps I missed something.

I think this is accurate. When we created the Win32 native port there
was a lot of concern about how to handle shared memory in a BACKEND_EXEC
case, namely that postmaster children were not copies which had the same
shared memory mappings, but rather were new processes that had to attach
to shared memory at a fixed address.

The WIN32 solution was to create the shared memory in the parent, and
then pass that address value down to the children to use in attaching to
the existing segment. We expected all sorts of problems with this but
in fact it seemed to work fine (most of the time).

As you can see it doesn't work 100% of the time, but it worked more
reliabily than we expected. What we have been waiting for is someone
who can recreate a failure so we can track down how to best make it 100%
reliable, and as you can see, we haven't had a flood of problem reports
to track this down.

If you want to help make it 100% we will work with you to find the
solution.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Trevor Talbot <quension(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 15:09:27
Message-ID: 200708241509.l7OF9RL00941@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Gregory Stark wrote:
> "Trevor Talbot" <quension(at)gmail(dot)com> writes:
>
> > I gather postgres depends on it being at the same address, and fixing that
> > isn't trivial?
>
> I haven't been following the rest of the thread so I'm not sure if this is
> important. But no, fixing that should be relatively trivial as there are
> already some configurations where it's not the case (the EXEC_BACKEND case I
> believe). The rest of the system uses a shared memory base pointer and
> references everything relative to that.

This is inaccurate, I believe. The original Berkeley code did exec()
for backends and hence allowed shared memory to be at different
addresses for different backends, but we started using fork() and
eliminated much of that capability for performance and clarify reasons,
so right now all backends have to have shared memory at the same
address, and changing this will not be simple.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Trevor Talbot" <quension(at)gmail(dot)com>
Cc: "Magnus Hagander" <magnus(at)hagander(dot)net>, "Shelby Cain" <alyandon(at)yahoo(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Terry Yapt" <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 15:46:12
Message-ID: 1225.1187970372@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Trevor Talbot" <quension(at)gmail(dot)com> writes:
> On 8/23/07, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>> Not that wild a guess, really :-) I'd say it's a very good possibility -
>> but I have no idea why it'd do that, since all backends load the same
>> DLLs at that stage.

> Not a valid assumption; you can't rely on consistent VM space among
> multiple [non-cloned] processes without a serious amount of effort.

I'm not sure if you have a specific technical meaning of "clone" in mind
here, but these processes are all executing the identical executable,
and taking care to map the shmem early in execution *before* they load
any DLLs. So it should work. Apparently, it *does* work for awhile for
the OP, and then stops working, which is even odder.

> I gather postgres depends on it being at the same address, and fixing
> that isn't trivial?

That's correct, and not having to change it is not negotiable ---
finding a way to make this work was one of the gating factors that
made it practical to have a Windows port at all.

If you've got a specific suggestion for making it more reliable,
we're all ears.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Trevor Talbot" <quension(at)gmail(dot)com>, "Magnus Hagander" <magnus(at)hagander(dot)net>, "Shelby Cain" <alyandon(at)yahoo(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Terry Yapt" <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 15:56:49
Message-ID: 1462.1187971009@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Trevor Talbot" <quension(at)gmail(dot)com> writes:
>> I gather postgres depends on it being at the same address, and fixing that
>> isn't trivial?

> I haven't been following the rest of the thread so I'm not sure if this is
> important. But no, fixing that should be relatively trivial as there are
> already some configurations where it's not the case (the EXEC_BACKEND case I
> believe). The rest of the system uses a shared memory base pointer and
> references everything relative to that.

That hasn't been the case for quite a few years, and we're not going back.
The pointer-to-offset-and-back gymnastics that that required were
utterly destructive to code readability and maintainability, mainly
because if everything stored in shmem data structures is an "offset"
then you can't get any useful error checking from the compiler about how
you are using the fields. It's like decreeing that every pointer
must be declared "void *" and cast to something else when it's used.

There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
but I think it's mostly just that no one's bothered to rewrite the code
for SHM_QUEUE linked lists. The vast majority of our shmem structures
use regular pointers, and have for years.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Trevor Talbot <quension(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Shelby Cain <alyandon(at)yahoo(dot)com>, Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 16:04:56
Message-ID: 20070824160456.GN31461@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane escribió:

> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
> but I think it's mostly just that no one's bothered to rewrite the code
> for SHM_QUEUE linked lists. The vast majority of our shmem structures
> use regular pointers, and have for years.

... except that, not knowing that, I wrote part of the new autovac code
using MAKE_PTR/OFFSET, and it needs to be rewritten eventually :-(

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Trevor Talbot" <quension(at)gmail(dot)com>, "Magnus Hagander" <magnus(at)hagander(dot)net>, "Shelby Cain" <alyandon(at)yahoo(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Terry Yapt" <yapt(at)technovell(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 16:17:21
Message-ID: 87sl68zz3i.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
> but I think it's mostly just that no one's bothered to rewrite the code
> for SHM_QUEUE linked lists. The vast majority of our shmem structures
> use regular pointers, and have for years.

Ah, I happened to be recently in that code so I was mislead.

So even in EXEC_BACKEND we require that we can attach to the shared memory at
a specified location. hm.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Trevor Talbot" <quension(at)gmail(dot)com>, "Magnus Hagander" <magnus(at)hagander(dot)net>, "Shelby Cain" <alyandon(at)yahoo(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Terry Yapt" <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 17:50:37
Message-ID: 3091.1187977837@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
>> but I think it's mostly just that no one's bothered to rewrite the code
>> for SHM_QUEUE linked lists. The vast majority of our shmem structures
>> use regular pointers, and have for years.

> Ah, I happened to be recently in that code so I was mislead.

IIRC, the reason for not bothering to change the SHM_QUEUE code (other
than inertia) was that it's a generic linked list package, and so if
it wasn't storing SHMEM_OFFSETs it'd be storing "void *"'s, and so there
didn't seem to be any traction to be gained in terms of compiler error
detection capability. However, if both you and Alvaro were confused
about the liveness of that coding convention, maybe it'd be worth making
a push to eliminate all trace of MAKE_PTR/MAKE_OFFSET. TODO for 8.4?

regards, tom lane


From: Terry Yapt <yapt(at)technovell(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Trevor Talbot <quension(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 19:19:06
Message-ID: 46CF2F2A.3020106@technovell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane escribió:
> I'm not sure if you have a specific technical meaning of "clone" in mind
> here, but these processes are all executing the identical executable,
> and taking care to map the shmem early in execution *before* they load
> any DLLs. So it should work. Apparently, it *does* work for awhile for
> the OP, and then stops working, which is even odder.
>
>
Yes, the windows system log (application log section) doesn't show any
error in several days. Suddenly errors bring back to life and syslog
errors repeats every few time. But again errors disappears and return
in a few hours. After few hours the system goes out.

Curiosity:
======
On the log lines I have and I sent to the list: * FATAL: could not
reattach to shared memory (key=5432001, addr=01D80000): Invalid argument
, this one: "addr=01D80000" is always the same in spite of the system
have been shutting down and restarted or the error was out for a days.

Greetings.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Trevor Talbot" <quension(at)gmail(dot)com>, "Magnus Hagander" <magnus(at)hagander(dot)net>, "Shelby Cain" <alyandon(at)yahoo(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Terry Yapt" <yapt(at)technovell(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-24 20:08:51
Message-ID: 871wdszodo.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
>>> but I think it's mostly just that no one's bothered to rewrite the code
>>> for SHM_QUEUE linked lists. The vast majority of our shmem structures
>>> use regular pointers, and have for years.
>
>> Ah, I happened to be recently in that code so I was mislead.
>
> IIRC, the reason for not bothering to change the SHM_QUEUE code (other
> than inertia) was that it's a generic linked list package, and so if
> it wasn't storing SHMEM_OFFSETs it'd be storing "void *"'s, and so there
> didn't seem to be any traction to be gained in terms of compiler error
> detection capability. However, if both you and Alvaro were confused
> about the liveness of that coding convention, maybe it'd be worth making
> a push to eliminate all trace of MAKE_PTR/MAKE_OFFSET. TODO for 8.4?

It would also make using gdb to look at the lock queues a bit less of a pain.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: "Trevor Talbot" <quension(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Magnus Hagander" <magnus(at)hagander(dot)net>, "Shelby Cain" <alyandon(at)yahoo(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Terry Yapt" <yapt(at)technovell(dot)com>
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-26 15:06:52
Message-ID: 90bce5730708260806o2b8afa60q3dd5a33567e2848a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 8/24/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Trevor Talbot" <quension(at)gmail(dot)com> writes:
> > On 8/23/07, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> >> Not that wild a guess, really :-) I'd say it's a very good possibility -
> >> but I have no idea why it'd do that, since all backends load the same
> >> DLLs at that stage.
>
> > Not a valid assumption; you can't rely on consistent VM space among
> > multiple [non-cloned] processes without a serious amount of effort.
>
> I'm not sure if you have a specific technical meaning of "clone" in mind
> here, but these processes are all executing the identical executable,
> and taking care to map the shmem early in execution *before* they load
> any DLLs. So it should work. Apparently, it *does* work for awhile for
> the OP, and then stops working, which is even odder.

"Clone" in the same sense as fork(): duplicating a process instead of
regenerating it. Even ignoring things like DLL replacement and
LD_PRELOAD-style options, there's still a lot of opportunity for
dynamic behavior. All DLLs have an initialization routine called by
the loader (and on thread creation), which tends to be used to set up
things you don't want the caller to have to explicitly initialize.
DLLs that maintain global state they share with copies of themselves
in other processes can set up shared memory etc to do that. They can
easily change their behavior based on the environment at the time of
process start.

There are also all the hooks for extension points, such as Winsock
LSPs. Most such things happen only after an explicit initialization
(e.g. WSAStartup() or socket creation in the Winsock case), but
between the C runtime and third-party libraries, it may be happening
when you don't expect it.

All that said, I don't actually have a real-world example of process
VM layout changing like this, especially since you are using it early
to avoid this very problem. I'd love to find out exactly what's going
on in Terry's case, but I haven't come up with a good way to do it
that doesn't disturb his production environment.

> If you've got a specific suggestion for making it more reliable,
> we're all ears.

To elaborate on what I said earlier, internal_forkexec() creates the
process suspended; while it has an execution environment set up, the
loader hasn't done all the DLL linking and initialization yet, so the
address space is relatively untouched. At that point you could use
VirtualAllocEx() to reserve VM space for the shared memory at the
right address, and proceed with the rest of the setup. When the new
backend starts up, it would then VirtualFree() that space immediately
before calling MapViewOfFileEx() on it.

I can probably set up with the 8.3 tree and MSVC to create an
artificial failure, and play with the above as a fix, but I'm not
quite sure when that will be. There's still the issue of verifying it
is the problem on Terry's machine, and figuring out a fix for him.

On 8/24/07, Terry Yapt <yapt(at)technovell(dot)com> wrote:

> Yes, the windows system log (application log section) doesn't show any
> error in several days. Suddenly errors bring back to life and syslog
> errors repeats every few time. But again errors disappears and return
> in a few hours. After few hours the system goes out.
>
> Curiosity:
> ======
> On the log lines I have and I sent to the list: * FATAL: could not
> reattach to shared memory (key=5432001, addr=01D80000): Invalid argument
> , this one: "addr=01D80000" is always the same in spite of the system
> have been shutting down and restarted or the error was out for a days.

The environment is consistent then. Whatever is going on, when
postgres first starts things are normal, something just changes later
and the change is temporary. As vague guides, I would look at some
kind of global resource usage/tracking, and scheduled tasks. Do you
see any patterns about WHEN this happens? During high load periods?
Any antivirus or other security type tasks running on the machine?
Any third-party VPN type software? Fast User Switching or Remote
Desktop use?


From: Terry Yapt <yapt(at)technovell(dot)com>
To: Trevor Talbot <quension(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-08-26 16:22:40
Message-ID: 46D1A8D0.8060907@technovell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Trevor Talbot escribió:
> The environment is consistent then. Whatever is going on, when
> postgres first starts things are normal, something just changes later
> and the change is temporary. As vague guides, I would look at some
> kind of global resource usage/tracking, and scheduled tasks. Do you
> see any patterns about WHEN this happens? During high load periods?
> Any antivirus or other security type tasks running on the machine?
> Any third-party VPN type software? Fast User Switching or Remote
> Desktop use?
I have spent a lot of time looking for patterns on system logs, apache
logs, postgres logs, etc...
I have not found any clue conclusive.

Only I can say I have this kind of errors on postgreSQL-Logs:
'2007-08-21 15:19:21 ERROR: could not open relation 16692/16694/17295:
Invalid argument'
And next log line/s are the statement-X. But Statement-X runs ok and
give me right results when I copy+paste on any sql editor connected to
that DB.

That errors are not 'linked on time' with FATAL errors we are speaking
about on this thread.

I am trying to get the opportunity to migrate that DB to another server
and use that server to test anything we want, but the customer is
reluctant to let me that server to try-test-errors process because that
is their mail and web server too. :-(

In spite of that server is remote far away from my location I have a
console (UltraVNC) to it if you need something to looking for.

Greetings.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2007-09-14 04:19:38
Message-ID: 200709140419.l8E4Jcv29919@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Magnus Hagander wrote:
> Shelby Cain wrote:
> >> ----- Original Message ---- From: Magnus Hagander
> >> <magnus(at)hagander(dot)net> To: Alvaro Herrera
> >> <alvherre(at)commandprompt(dot)com> Cc: Terry Yapt <yapt(at)technovell(dot)com>;
> >> pgsql-general(at)postgresql(dot)org Sent: Thursday, August 23, 2007
> >> 3:43:32 PM Subject: Re: [GENERAL] FATAL: could not reattach to
> >> shared memory (Win32)
> >>
> >>
> >> 8.3 will have a new way to deal with shared mem on win32. It's the
> >> same underlying tech, but we're no longer trying to squeeze it into
> >> an emulation of sysv. With a bit of luck, that'll help :-)
> >>
> >> //Magnus
> >>
> >
> > Wild guess on my part... could that error be the result of an attempt
> > to map shared memory into a process at a fixed location that just
> > happens to already be occupied by a dll that Windows had decided to
> > relocate?
>
> Not that wild a guess, really :-) I'd say it's a very good possibility -
> but I have no idea why it'd do that, since all backends load the same
> DLLs at that stage.
>
> //Magnus
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Terry Yapt <yapt(at)technovell(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-general(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Trevor Talbot <quension(at)gmail(dot)com>
Subject: Re: FATAL: could not reattach to shared memory
Date: 2007-10-24 20:28:45
Message-ID: 471FAAFD.7050206@technovell.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Bruce Momjian escribió:
> This has been saved for the 8.4 release:
>
> http://momjian.postgresql.org/cgi-bin/pgpatches_hold

Update:

I have installed PostgreSQL 8.2.5 and move database from old to new
server. This was 2 weeks ago.

New Server is a Windows 2003 Server running other services too.

Until now, this problem has gone out and PosgresSQL is running like a
charm on the new server. :-)

Greetings.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Trevor Talbot <quension(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2008-03-12 18:28:53
Message-ID: 200803121828.m2CISrY22049@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Added to TODO:

* Remove use of MAKE_PTR and MAKE_OFFSET macros

http://archives.postgresql.org/pgsql-general/2007-08/msg01510.php

---------------------------------------------------------------------------

Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > "Trevor Talbot" <quension(at)gmail(dot)com> writes:
> >> I gather postgres depends on it being at the same address, and fixing that
> >> isn't trivial?
>
> > I haven't been following the rest of the thread so I'm not sure if this is
> > important. But no, fixing that should be relatively trivial as there are
> > already some configurations where it's not the case (the EXEC_BACKEND case I
> > believe). The rest of the system uses a shared memory base pointer and
> > references everything relative to that.
>
> That hasn't been the case for quite a few years, and we're not going back.
> The pointer-to-offset-and-back gymnastics that that required were
> utterly destructive to code readability and maintainability, mainly
> because if everything stored in shmem data structures is an "offset"
> then you can't get any useful error checking from the compiler about how
> you are using the fields. It's like decreeing that every pointer
> must be declared "void *" and cast to something else when it's used.
>
> There are a few old bits of code that still use MAKE_PTR/MAKE_OFFSET,
> but I think it's mostly just that no one's bothered to rewrite the code
> for SHM_QUEUE linked lists. The vast majority of our shmem structures
> use regular pointers, and have for years.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Shelby Cain <alyandon(at)yahoo(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Terry Yapt <yapt(at)technovell(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: FATAL: could not reattach to shared memory (Win32)
Date: 2008-03-12 18:34:06
Message-ID: 200803121834.m2CIY6728682@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Added to Win32 TODO:

o Diagnose problem where shared memory can sometimes not be
attached by postmaster children

http://archives.postgresql.org/pgsql-general/2007-08/msg01377.php

---------------------------------------------------------------------------

Magnus Hagander wrote:
> Shelby Cain wrote:
> >> ----- Original Message ---- From: Magnus Hagander
> >> <magnus(at)hagander(dot)net> To: Alvaro Herrera
> >> <alvherre(at)commandprompt(dot)com> Cc: Terry Yapt <yapt(at)technovell(dot)com>;
> >> pgsql-general(at)postgresql(dot)org Sent: Thursday, August 23, 2007
> >> 3:43:32 PM Subject: Re: [GENERAL] FATAL: could not reattach to
> >> shared memory (Win32)
> >>
> >>
> >> 8.3 will have a new way to deal with shared mem on win32. It's the
> >> same underlying tech, but we're no longer trying to squeeze it into
> >> an emulation of sysv. With a bit of luck, that'll help :-)
> >>
> >> //Magnus
> >>
> >
> > Wild guess on my part... could that error be the result of an attempt
> > to map shared memory into a process at a fixed location that just
> > happens to already be occupied by a dll that Windows had decided to
> > relocate?
>
> Not that wild a guess, really :-) I'd say it's a very good possibility -
> but I have no idea why it'd do that, since all backends load the same
> DLLs at that stage.
>
> //Magnus
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +