Re: 2PC-induced lockup

Lists: pgsql-hackers
From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: 2PC-induced lockup
Date: 2007-07-10 13:14:59
Message-ID: 200707101515.00782.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The following command sequence appears to lock up the database system:

BEGIN;
LOCK pg_authid;
PREPARE TRANSACTION 'foo';
\q

After that you can't connect anymore, even in single-user mode. The only way
I could find is to clear out the pg_twophase directory, but I'm not sure
whether it is safe to do that.

Should this be prevented somehow, and is there a better recovery path?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Hans-Juergen Schoenig <postgres(at)cybertec(dot)at>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-10 13:21:07
Message-ID: E4E8FDC4-FF7F-4B89-BE8B-80780A263FB5@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

is it good to allow locks on system tables at all?
i am not so sure. have seen some disaster in the past with that. just
consider somebody placing ACCESS EXCLUSIVE LOCK on a system table. it
is basically denial of service.

best regards,

hans

On Jul 10, 2007, at 3:14 PM, Peter Eisentraut wrote:

> The following command sequence appears to lock up the database system:
>
> BEGIN;
> LOCK pg_authid;
> PREPARE TRANSACTION 'foo';
> \q
>
> After that you can't connect anymore, even in single-user mode.
> The only way
> I could find is to clear out the pg_twophase directory, but I'm not
> sure
> whether it is safe to do that.
>
> Should this be prevented somehow, and is there a better recovery path?
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Cybertec Geschwinde & Schönig GmbH
Gröhrmühlgasse 26, 2700 Wiener Neustadt
Tel: +43/1/205 10 35 / 340
www.postgresql.at, www.cybertec.at


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-10 14:41:31
Message-ID: 19475.1184078491@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> The following command sequence appears to lock up the database system:
> BEGIN;
> LOCK pg_authid;
> PREPARE TRANSACTION 'foo';
> \q

> After that you can't connect anymore, even in single-user mode. The
> only way I could find is to clear out the pg_twophase directory, but
> I'm not sure whether it is safe to do that.

> Should this be prevented somehow, and is there a better recovery path?

AFAICS this is just one of many ways in which a superuser can shoot
himself in the foot; I'm not eager to try to prevent it.

Right offhand, clearing pg_twophase while the system is stopped should
be safe enough.

regards, tom lane


From: tomas(at)tuxteam(dot)de
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 04:07:13
Message-ID: 20070711040713.GA19967@www.trapp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Jul 10, 2007 at 10:41:31AM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > The following command sequence appears to lock up the database system:
> > BEGIN;
> > LOCK pg_authid;
> > PREPARE TRANSACTION 'foo';
> > \q
>
> > After that you can't connect anymore, even in single-user mode. The
> > only way I could find is to clear out the pg_twophase directory, but
> > I'm not sure whether it is safe to do that.
>
> > Should this be prevented somehow, and is there a better recovery path?
>
> AFAICS this is just one of many ways in which a superuser can shoot
> himself in the foot; I'm not eager to try to prevent it.
>
> Right offhand, clearing pg_twophase while the system is stopped should
> be safe enough.

It might make sense then to clear the pg_twophase directory on DB
startup. Nobody would expect the locks to persist a database restart --
or am I way off?

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGlFdxBcgs9XrR2kYRAp9/AJ4s8fBkhtaxqfu0QxBhSN2lCi++zgCfRsS9
Jpjv6513ubPtfldf2fItzj0=
=KAW0
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: tomas(at)tuxteam(dot)de
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 04:38:09
Message-ID: 17090.1184128689@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

tomas(at)tuxteam(dot)de writes:
> On Tue, Jul 10, 2007 at 10:41:31AM -0400, Tom Lane wrote:
>> Right offhand, clearing pg_twophase while the system is stopped should
>> be safe enough.

> It might make sense then to clear the pg_twophase directory on DB
> startup.

<blink> I fear you have 100% misunderstood the point. The *only*
reason for that feature is to survive DB crashes.

regards, tom lane


From: tomas(at)tuxteam(dot)de
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: tomas(at)tuxteam(dot)de, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 08:15:53
Message-ID: 20070711081553.GB21484@www.trapp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, Jul 11, 2007 at 12:38:09AM -0400, Tom Lane wrote:
> tomas(at)tuxteam(dot)de writes:
[...]
> > It might make sense then to clear the pg_twophase directory on DB
> > startup.
>
> <blink> I fear you have 100% misunderstood the point. The *only*
> reason for that feature is to survive DB crashes.

Ah -- so it is intentional that it keeps the DB from starting again.
OK, then I misunderstood. Sorry for the noise.

Regards
- -- tomás
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFGlJG5Bcgs9XrR2kYRApC9AJsF+wm9z5zJXpZ98ThuV/gn9ozpVwCfbf3L
G4OA0pu3rh/o2rOL/OvZ9bU=
=+fd6
-----END PGP SIGNATURE-----


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 14:43:23
Message-ID: 604pkbxapg.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

tomas(at)tuxteam(dot)de writes:
> On Wed, Jul 11, 2007 at 12:38:09AM -0400, Tom Lane wrote:
>> tomas(at)tuxteam(dot)de writes:
> [...]
>> > It might make sense then to clear the pg_twophase directory on DB
>> > startup.
>>
>> <blink> I fear you have 100% misunderstood the point. The *only*
>> reason for that feature is to survive DB crashes.
>
> Ah -- so it is intentional that it keeps the DB from starting again.
> OK, then I misunderstood. Sorry for the noise.

I don't think that is so much "intentional" as it is an "emergent
property."

The usual point to 2PC is that once transactions are PREPAREd, they
*need* to be stored robustly enough to survive even a DB crash.

If one locks certain vital system resources, as part of that PREPAREd
transaction, that evidently causes some problems, alas...

The right resolution to this is not, a priori, evident yet.
--
(format nil "~S(at)~S" "cbbrowne" "linuxdatabases.info")
http://cbbrowne.com/info/linuxdistributions.html
Rules of the Evil Overlord #38. "If an enemy I have just killed has a
younger sibling or offspring anywhere, I will find them and have them
killed immediately, instead of waiting for them to grow up harboring
feelings of vengeance towards me in my old age."
<http://www.eviloverlord.com/>


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 15:28:16
Message-ID: 20070711152816.GN1241@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 11, 2007 at 10:43:23AM -0400, Chris Browne wrote:
> The right resolution to this is not, a priori, evident yet.

_A posteriori_, though, it seems to me the right resolution is "don't
do that" ;-)

A
--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
Users never remark, "Wow, this software may be buggy and hard
to use, but at least there is a lot of code underneath."
--Damien Katz


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 15:44:12
Message-ID: 4694FACC.4030800@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Chris Browne wrote:
> If one locks certain vital system resources, as part of that PREPAREd
> transaction, that evidently causes some problems, alas...
>
> The right resolution to this is not, a priori, evident yet.

It's not? I agree with Tom here; this is just one of the numerous things
you can do to screw up your database as a superuser. Why would you LOCK
the pg_auth table, or any other system table for that matter, in the
first place? Let alone in a distributed transaction.

FWIW, deleting the files from pg_twophase is safe when the system is
shut down.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 17:04:51
Message-ID: 20070711170451.GY1241@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 11, 2007 at 04:44:12PM +0100, Heikki Linnakangas wrote:
> FWIW, deleting the files from pg_twophase is safe when the system is
> shut down.

Is it safe for the PREPAREd TRANSACTIONs? I assume not. That is, in
Peter's presumably experimental case, it might be ok to delete the
files, but on a production system, you'd violate the semantics of 2PC
by doing this?

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
Users never remark, "Wow, this software may be buggy and hard
to use, but at least there is a lot of code underneath."
--Damien Katz


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 17:15:12
Message-ID: 46951020.9010606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan wrote:
> On Wed, Jul 11, 2007 at 04:44:12PM +0100, Heikki Linnakangas wrote:
>> FWIW, deleting the files from pg_twophase is safe when the system is
>> shut down.
>
> Is it safe for the PREPAREd TRANSACTIONs? I assume not. That is, in
> Peter's presumably experimental case, it might be ok to delete the
> files, but on a production system, you'd violate the semantics of 2PC
> by doing this?

It's effectively the same as manually issuing a ROLLBACK PREPARED. It
will brake the atomicity of the global transaction, if some branches of
that global transaction in other resource managers have already been
committed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 18:12:29
Message-ID: 20070711181229.GA1241@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:
> It's effectively the same as manually issuing a ROLLBACK PREPARED. It
> will brake the atomicity of the global transaction, if some branches of
> that global transaction in other resource managers have already been
> committed.

But how do you know which file to delete? Is it keyed to the
transaction identifier or something?

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
However important originality may be in some fields, restraint and
adherence to procedure emerge as the more significant virtues in a
great many others. --Alain de Botton


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 19:06:45
Message-ID: 200707112106.46576.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> It's not? I agree with Tom here; this is just one of the numerous
> things you can do to screw up your database as a superuser. Why would
> you LOCK the pg_auth table, or any other system table for that
> matter, in the first place? Let alone in a distributed transaction.

Well, my test case arose from a real application scenario, not an
attempt to destroy my database system.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 19:12:11
Message-ID: 46952B8B.1090401@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
> Heikki Linnakangas wrote:
>> It's not? I agree with Tom here; this is just one of the numerous
>> things you can do to screw up your database as a superuser. Why would
>> you LOCK the pg_auth table, or any other system table for that
>> matter, in the first place? Let alone in a distributed transaction.
>
> Well, my test case arose from a real application scenario, not an
> attempt to destroy my database system.

Why does the application LOCK pg_auth?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 19:53:42
Message-ID: 916.1184183622@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Peter Eisentraut wrote:
>> Heikki Linnakangas wrote:
>>> It's not? I agree with Tom here; this is just one of the numerous
>>> things you can do to screw up your database as a superuser. Why would
>>> you LOCK the pg_auth table, or any other system table for that
>>> matter, in the first place? Let alone in a distributed transaction.
>>
>> Well, my test case arose from a real application scenario, not an
>> attempt to destroy my database system.

> Why does the application LOCK pg_auth?

Even if there is a reason for a lock, surely it's not necessary to use
AccessExclusiveLock. A lesser lock would synchronize whatever the heck
it's doing without locking out readers.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 19:55:27
Message-ID: 999.1184183727@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan <ajs(at)crankycanuck(dot)ca> writes:
> On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:
>> It's effectively the same as manually issuing a ROLLBACK PREPARED.

> But how do you know which file to delete?

You don't. In extremis you could probably throw together some
inspection tool that could look though the 2PC records to find out which
file mentioned an exclusive lock on pg_authid ...

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 20:08:12
Message-ID: 200707112208.13392.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:

> Why does the application LOCK pg_auth?

It does it with NOWAIT to determine if some other connection had already
locked it (because it was modifying some roles) in order not to lock up
the program. This (or something like it, because this doesn't work,
after all) is unfortunately necessary because schema changes don't obey
ordinary snapshot rules.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 20:26:34
Message-ID: 46953CFA.5070404@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan wrote:
> On Wed, Jul 11, 2007 at 06:15:12PM +0100, Heikki Linnakangas wrote:
>> It's effectively the same as manually issuing a ROLLBACK PREPARED. It
>> will brake the atomicity of the global transaction, if some branches of
>> that global transaction in other resource managers have already been
>> committed.
>
> But how do you know which file to delete? Is it keyed to the
> transaction identifier or something?

The xid is encoded in the filename. If you can't start up the database
and look at pg_locks, you can't do much other than guess.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 20:54:13
Message-ID: 20070711205413.GE2424@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote:
>
> The xid is encoded in the filename. If you can't start up the database
> and look at pg_locks, you can't do much other than guess.

So then in this sort of case, it isn't _really_ safe to delete those
files, because the commitment you made before crash when you accepted
a PREPARE TRANSACTION is going to be gone, which violates the 2PC
rules.

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
When my information changes, I alter my conclusions. What do you do sir?
--attr. John Maynard Keynes


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 21:33:47
Message-ID: 46954CBB.9010804@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan wrote:
> On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote:
>> The xid is encoded in the filename. If you can't start up the database
>> and look at pg_locks, you can't do much other than guess.
>
> So then in this sort of case, it isn't _really_ safe to delete those
> files, because the commitment you made before crash when you accepted
> a PREPARE TRANSACTION is going to be gone, which violates the 2PC
> rules.

Yes, though if neither you nor the transaction manager can connect, you
don't have much choice.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 21:41:45
Message-ID: 1184190105.4316.44.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2007-07-10 at 10:41 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > The following command sequence appears to lock up the database system:
> > BEGIN;
> > LOCK pg_authid;
> > PREPARE TRANSACTION 'foo';
> > \q
>
> > After that you can't connect anymore, even in single-user mode. The
> > only way I could find is to clear out the pg_twophase directory, but
> > I'm not sure whether it is safe to do that.
>
> > Should this be prevented somehow, and is there a better recovery path?
>
> AFAICS this is just one of many ways in which a superuser can shoot
> himself in the foot; I'm not eager to try to prevent it.
>
> Right offhand, clearing pg_twophase while the system is stopped should
> be safe enough.

Safe from the perspective of the rest of the system. The prepared
transactions will clearly be lost and that might be worth millions.

I'm concerned that this advice will lead to clearing pg_twophase every
time that the system won't start properly.

I'd be much more comfortable if LOCK TABLE caused a message to the log
if it is executed on any system table. I can't really see a reason to
allow a user the ability to explicitly lock out a system table and would
prefer if that were banned completely. It's DoS if nothing else. A
simple check on LOCK TABLE won't cost much in the normal execution path.

There seems like a number of ways that unresolved prepared transactions
can cause problems. We really need to have startup mention how many
prepared transactions there are, so we have some chance of understanding
and resolving potential problems. Without such a message we might well
experience downtimes of many hours before somebody thinks to check
pg_twophase and that runs against our goal of higher availability.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Andrew Sullivan" <ajs(at)crankycanuck(dot)ca>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 21:47:25
Message-ID: 1184190445.4316.48.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2007-07-11 at 22:33 +0100, Heikki Linnakangas wrote:
> Andrew Sullivan wrote:
> > On Wed, Jul 11, 2007 at 09:26:34PM +0100, Heikki Linnakangas wrote:
> >> The xid is encoded in the filename. If you can't start up the database
> >> and look at pg_locks, you can't do much other than guess.
> >
> > So then in this sort of case, it isn't _really_ safe to delete those
> > files, because the commitment you made before crash when you accepted
> > a PREPARE TRANSACTION is going to be gone, which violates the 2PC
> > rules.
>
> Yes, though if neither you nor the transaction manager can connect, you
> don't have much choice.

True, but I'm worried that this discussion will lead, via Google, to the
impression that if you are having connection problems the best thing to
do is to delete everything in pg_twophase. There are hundreds of other
issues that might prevent connection and it would require significant
expertise to isolate this as the error. I would prefer to explicitly
avoid this kind of error, so that we can return to the idea that
removing pg_twophase is never a requirement.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-11 22:09:55
Message-ID: 3731.1184191795@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> I'd be much more comfortable if LOCK TABLE caused a message to the log
> if it is executed on any system table.

Enabled by "set training_wheels = on", perhaps?

This is really pretty silly to be getting worked up about. The command
in question wouldn't have been allowed at all except to a superuser,
and there are plenty of ways to catastrophically destroy your database
when you are superuser; most of which we will never consider blocking
for the same reasons that Unix systems have never tried to block root
from doing "rm -rf /". I'd say the real design flaw in Peter's
referenced application is that they're running it as superuser.

> There seems like a number of ways that unresolved prepared transactions
> can cause problems. We really need to have startup mention how many
> prepared transactions there are, so we have some chance of understanding
> and resolving potential problems.

While I have no particular objection to such a log entry, I doubt it
will fix anything; how many people will really think to look in the
postmaster log? In any case, most of the problems I've personally run
into with prepared xacts have nothing to do with crashes and so nothing
like that would ever get emitted. (The typical way I get bitten is to
interrupt the regression tests because I changed my mind about
something, and manage to do this just while the prepared_xacts test has
some open prepared xacts.)

regards, tom lane


From: "Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-12 02:03:38
Message-ID: 46958BFA.3060500@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>> I'd be much more comfortable if LOCK TABLE caused a message to the log
>> if it is executed on any system table.
>
> Enabled by "set training_wheels = on", perhaps?
>
> This is really pretty silly to be getting worked up about. The command
> in question wouldn't have been allowed at all except to a superuser,
> and there are plenty of ways to catastrophically destroy your database
> when you are superuser; most of which we will never consider blocking
> for the same reasons that Unix systems have never tried to block root
> from doing "rm -rf /". I'd say the real design flaw in Peter's
> referenced application is that they're running it as superuser.

Yeah.. though "lock pg_auth; prepare" looks quite innocent, much more
than say "delete from pg_database" or "rm -rf whatever".
At least to the untrained eye.

I fully agree that that special-casing this particular way to shoot yourself
in the foot is not worth it - but maybe pursuing a more general solution
would be worthwile? Maybe superuser-connections could e.g. ignore
any errors that occur while reading a system table, together with
a big, fat warning, but still allow a logon? That of course depends
on the assumption that basic authentication is possible using just
the information from the flatfiles and pg_hba.conf, which I'm not
sure about.

greetings, Florian Pflug


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 2PC-induced lockup
Date: 2007-07-12 08:19:09
Message-ID: 1184228349.4316.77.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2007-07-11 at 18:09 -0400, Tom Lane wrote:

> > There seems like a number of ways that unresolved prepared transactions
> > can cause problems. We really need to have startup mention how many
> > prepared transactions there are, so we have some chance of understanding
> > and resolving potential problems.
>
> While I have no particular objection to such a log entry, I doubt it
> will fix anything; how many people will really think to look in the
> postmaster log?

Even if it were just you and me. From my perspective, thats enough.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Subject: Re: 2PC-induced lockup
Date: 2007-07-12 15:17:25
Message-ID: 200707121117.26171.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday 12 July 2007 04:19, Simon Riggs wrote:
> On Wed, 2007-07-11 at 18:09 -0400, Tom Lane wrote:
> > > There seems like a number of ways that unresolved prepared transactions
> > > can cause problems. We really need to have startup mention how many
> > > prepared transactions there are, so we have some chance of
> > > understanding and resolving potential problems.
> >
> > While I have no particular objection to such a log entry, I doubt it
> > will fix anything; how many people will really think to look in the
> > postmaster log?
>
> Even if it were just you and me. From my perspective, thats enough.

Well, Tom doesn't look at the log files, so I guess your idea is shot...

:-)
--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Michael Paesold <mpaesold(at)gmx(dot)at>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-12 18:02:55
Message-ID: 46966CCF.4010202@gmx.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Wed, 2007-07-11 at 18:09 -0400, Tom Lane wrote:
>
>>> There seems like a number of ways that unresolved prepared transactions
>>> can cause problems. We really need to have startup mention how many
>>> prepared transactions there are, so we have some chance of understanding
>>> and resolving potential problems.
>> While I have no particular objection to such a log entry, I doubt it
>> will fix anything; how many people will really think to look in the
>> postmaster log?
>
> Even if it were just you and me. From my perspective, thats enough.

At least, such a message seems much more useful than the list of
historic startup messages that were removed recently. Just my two €-cents.

Best Regards
Michael Paesold


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-12 18:35:49
Message-ID: 20070712183549.GK4606@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 11, 2007 at 10:47:25PM +0100, Simon Riggs wrote:
> expertise to isolate this as the error. I would prefer to explicitly
> avoid this kind of error, so that we can return to the idea that
> removing pg_twophase is never a requirement.

This was pretty much my point. It's one thing to say, "If you are
completely hosed, you will lose some data." But 2PC is making some
pretty strong promises, and I sort of hate it that it's not real hard
to break things in such a way that those promises have to be broken.

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
When my information changes, I alter my conclusions. What do you do sir?
--attr. John Maynard Keynes


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-12 18:44:24
Message-ID: 20070712184424.GL4606@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 11, 2007 at 06:09:55PM -0400, Tom Lane wrote:
> This is really pretty silly to be getting worked up about. The command
> in question wouldn't have been allowed at all except to a superuser,
> and there are plenty of ways to catastrophically destroy your database
> when you are superuser; most of which we will never consider blocking

I think the problem is it is nowise obvious that LOCK [sometable];
PREPARE TRANSACTION is, as superuser, dangerous in the way that DROP
CASCADE is. Or rm -rf, for that matter. It seems to me that at the
very least, some pretty extensive documentation of the perils of
running 2PC as a superuser ought to be available, and if it's there,
it didn't leap out at me.

But the other problem I see here is that the solution hits more than
just the problematic state. If we have bad pages on disk, for
instance, we zero pages; we don't drop the table. Similarly, it
seems that all that's necessary here is an external tool to grovel
through the prepared transaction files and somehow figure out what
locks, if any, they entail. Wouldn't that be enough to then allow
you to shoot the relevant file, thereby losing only that prepared
transaction instead of all of them? (I'm not suggesting that no data
must ever be lost in this case; just that we should lose the minimum
necessary to make the system work.)

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
--Alexander Hamilton


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 2PC-induced lockup
Date: 2007-07-12 19:41:47
Message-ID: 9380.1184269307@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan <ajs(at)crankycanuck(dot)ca> writes:
> But the other problem I see here is that the solution hits more than
> just the problematic state. If we have bad pages on disk, for
> instance, we zero pages; we don't drop the table. Similarly, it
> seems that all that's necessary here is an external tool to grovel
> through the prepared transaction files and somehow figure out what
> locks, if any, they entail.

Something equivalent to pg_filedump or xlogdump for 2PC state files
would be worth having for development/debugging purposes, quite aside
from any possible usefulness for getting out of this type of
predicament. Any volunteers out there to write/maintain such a utility?

regards, tom lane