Lists: | pgsql-hackers |
---|
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-30 17:09:35 |
Message-ID: | 5380.1180544975@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
I just finished giving someone the standard advice to wait a bit before
trying to drop a database that'd just been accessed:
http://archives.postgresql.org/pgsql-general/2007-05/msg01505.php
AFAICT a "real" fix for this would involve making PQfinish() synchronous
(don't return till backend is dead), which doesn't seem like a great
idea. However, it suddenly struck me that we could probably make most
of the problem go away if we put that same wait into DROP DATABASE
itself --- that is, if we see other backends in the target DB, sleep
for a second or two and then recheck before erroring out.
This isn't bulletproof since under high load the other backend might
not get to quit, but it'd surely reduce the frequency of complaints
a great deal. And we could take out the ad-hoc sleeps that are done
in (eg) the contrib regression tests.
Thoughts?
regards, tom lane
From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-30 17:18:02 |
Message-ID: | 465DB1CA.10204@hagander.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
> I just finished giving someone the standard advice to wait a bit before
> trying to drop a database that'd just been accessed:
> http://archives.postgresql.org/pgsql-general/2007-05/msg01505.php
>
> AFAICT a "real" fix for this would involve making PQfinish() synchronous
> (don't return till backend is dead), which doesn't seem like a great
> idea. However, it suddenly struck me that we could probably make most
> of the problem go away if we put that same wait into DROP DATABASE
> itself --- that is, if we see other backends in the target DB, sleep
> for a second or two and then recheck before erroring out.
>
> This isn't bulletproof since under high load the other backend might
> not get to quit, but it'd surely reduce the frequency of complaints
> a great deal. And we could take out the ad-hoc sleeps that are done
> in (eg) the contrib regression tests.
>
> Thoughts?
An option could be to add a PQfinishWait() API call, and have psql use
this one when passed a special commandline argument (which if I
understood right this guys "commerercial alternative" had). It might be
useful in other cases as well, but I can't really think of one right now :-)
//Magnus
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Magnus Hagander <magnus(at)hagander(dot)net> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-30 17:35:43 |
Message-ID: | 5837.1180546543@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Tom Lane wrote:
>> AFAICT a "real" fix for this would involve making PQfinish() synchronous
>> (don't return till backend is dead), which doesn't seem like a great
>> idea. However, it suddenly struck me that we could probably make most
>> of the problem go away if we put that same wait into DROP DATABASE
>> itself --- that is, if we see other backends in the target DB, sleep
>> for a second or two and then recheck before erroring out.
> An option could be to add a PQfinishWait() API call, and have psql use
> this one when passed a special commandline argument (which if I
> understood right this guys "commerercial alternative" had). It might be
> useful in other cases as well, but I can't really think of one right now :-)
The trouble with trying to fix this on the client side is that it's not
fixed unless every client behaves that way (all the time). Otherwise
we'll still be hearing the same complaints. "Use this magic little
option on the previous connection" isn't a user-friendly answer.
regards, tom lane
From: | Brian Hurt <bhurt(at)janestcapital(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-30 17:36:14 |
Message-ID: | 465DB60E.40806@janestcapital.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
>I just finished giving someone the standard advice to wait a bit before
>trying to drop a database that'd just been accessed:
>http://archives.postgresql.org/pgsql-general/2007-05/msg01505.php
>
>AFAICT a "real" fix for this would involve making PQfinish() synchronous
>(don't return till backend is dead), which doesn't seem like a great
>idea. However, it suddenly struck me that we could probably make most
>of the problem go away if we put that same wait into DROP DATABASE
>itself --- that is, if we see other backends in the target DB, sleep
>for a second or two and then recheck before erroring out.
>
>This isn't bulletproof since under high load the other backend might
>not get to quit, but it'd surely reduce the frequency of complaints
>a great deal. And we could take out the ad-hoc sleeps that are done
>in (eg) the contrib regression tests.
>
>Thoughts?
>
>
Is this a synchronization issue? I'm wondering if there isn't a better
solution. The problem with waiting is that a) you're going to be
waiting a lot when it's not necessary, and b) the likelyhood you won't
wait long enough (especially under load, as you mentioned).
I'm wondering if something like this would work. When a backend
connects to the database, it increments a semaphore associated with that
database. The last thing it does when exiting is release the semaphore-
which is the backend's way of saying "OK, all done here". The drop
database command checks the semaphore- if it still has a non-zero count,
it fails rather than dropping the database. A possibly optional
argument would have it wait until the semaphore is 0, and then drop the
database. This has the advantage of only waiting long enough.
No idea how practical this would be, tho...
Brian
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Brian Hurt <bhurt(at)janestcapital(dot)com> |
Cc: | pgsql-hackers(at)postgreSQL(dot)org |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-30 18:01:44 |
Message-ID: | 6181.1180548104@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Brian Hurt <bhurt(at)janestcapital(dot)com> writes:
> Tom Lane wrote:
>> I just finished giving someone the standard advice to wait a bit before
>> trying to drop a database that'd just been accessed:
>> http://archives.postgresql.org/pgsql-general/2007-05/msg01505.php
> Is this a synchronization issue?
The problem is that the user thinks his previous disconnect is finished
when it may not be --- it's entirely possible in fact that his old
backend hasn't even received the disconnect message yet. So I don't
think it's possible to rely on there being a state change inside the
database indicating that the other guy is about to exit.
Even if we had a semaphore of the sort you suggest, I doubt people would
want DROP DATABASE to wait indefinitely. The real question here is how
long is it reasonable for DROP DATABASE to wait before failing ...
regards, tom lane
From: | Gregory Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-30 19:59:06 |
Message-ID: | 87wsyqhylx.fsf@oxford.xeocode.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> However, it suddenly struck me that we could probably make most of the
> problem go away if we put that same wait into DROP DATABASE itself --- that
> is, if we see other backends in the target DB, sleep for a second or two and
> then recheck before erroring out.
Is there any way to tell, perhaps from the command string, that the process is
about to start exiting? What stage of exiting is it that we think the kernel
goes to lunch?
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
From: | Josh Berkus <josh(at)agliodbs(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-30 20:03:10 |
Message-ID: | 200705301303.10150.josh@agliodbs.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom,
> Even if we had a semaphore of the sort you suggest, I doubt people would
> want DROP DATABASE to wait indefinitely. The real question here is how
> long is it reasonable for DROP DATABASE to wait before failing ...
10 to 15 seconds, I'd say. Is that going to be long enough for backends to
release, assuming the DB isn't under extreme load?
--
Josh Berkus
PostgreSQL @ Sun
San Francisco
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Gregory Stark <stark(at)enterprisedb(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-30 20:27:40 |
Message-ID: | 18407.1180556860@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Is there any way to tell, perhaps from the command string, that the process is
> about to start exiting? What stage of exiting is it that we think the kernel
> goes to lunch?
I haven't really done any detailed investigation, but I would think that
a simple process exit (when there's not temp tables to drop or anything
like that) should complete within one scheduler timeslice. That would
mean that when this problem occurs, it's usually because the kernel hasn't
scheduled the backend at all since the disconnect message was sent;
which in turn means there is no way at all to know that the backend is
going to exit when it does get a chance to run.
regards, tom lane
From: | "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at> |
---|---|
To: | "Brian Hurt" <bhurt(at)janestcapital(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-05-31 08:32:18 |
Message-ID: | E1539E0ED7043848906A8FF995BDA579021B2C35@m0143.s-mxs.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> > However, it suddenly struck me that we could
> >probably make most of the problem go away if we put that same wait
into
> >DROP DATABASE itself --- that is, if we see other backends in the
> >target DB, sleep for a second or two and then recheck before erroring
out.
Yup, waiting in drop database up to 10-30 secs would imho be fine.
Andreas
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Josh Berkus <josh(at)agliodbs(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-06-01 19:41:59 |
Message-ID: | 22344.1180726919@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Even if we had a semaphore of the sort you suggest, I doubt people would
>> want DROP DATABASE to wait indefinitely. The real question here is how
>> long is it reasonable for DROP DATABASE to wait before failing ...
> 10 to 15 seconds, I'd say. Is that going to be long enough for backends to
> release, assuming the DB isn't under extreme load?
While testing this, 10 seconds seemed too long --- more than long enough
for someone to start thinking it's broken. I settled on 5 seconds which
seemed about the edge of the threshold of pain. Our experience with the
buildfarm suggests that 1 second is usually long enough (since that's
the delay we were using in the contrib regression tests, and they don't
fail often on this), so I think it'll be all right at 5.
regards, tom lane
From: | Jim Nasby <decibel(at)decibel(dot)org> |
---|---|
To: | Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at> |
Cc: | "Brian Hurt" <bhurt(at)janestcapital(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgreSQL(dot)org> |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-06-01 23:42:24 |
Message-ID: | B7086A58-B9E7-45F6-85EC-2D1A2BABA5BE@decibel.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On May 31, 2007, at 1:32 AM, Zeugswetter Andreas ADI SD wrote:
>>> However, it suddenly struck me that we could
>>> probably make most of the problem go away if we put that same wait
> into
>>> DROP DATABASE itself --- that is, if we see other backends in the
>>> target DB, sleep for a second or two and then recheck before
>>> erroring
> out.
>
> Yup, waiting in drop database up to 10-30 secs would imho be fine.
Even 10 seconds seems rather long, doesn't it? You'd have to have an
awfully busy system to need to wait more than like 5 seconds for the
closing backend to get scheduled, and it'd be rather ugly to force
someone to wait 30 seconds just to find out that someone's still
connected to the database.
How about starting with 5 seconds and seeing if that takes care of
most situations?
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Jim Nasby <decibel(at)decibel(dot)org> |
Cc: | Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>, "Brian Hurt" <bhurt(at)janestcapital(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Ye olde drop-the-database-you-just-left problem |
Date: | 2007-06-02 05:15:44 |
Message-ID: | 732.1180761344@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Jim Nasby <decibel(at)decibel(dot)org> writes:
> How about starting with 5 seconds and seeing if that takes care of
> most situations?
Yeah, I came to that same conclusion ...
http://archives.postgresql.org/pgsql-hackers/2007-06/msg00029.php
regards, tom lane