Why we still see some reports of "could not access transaction status"

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Why we still see some reports of "could not access transaction status"
Date: 2004-10-13 16:18:08
Message-ID: 15292.1097684288@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Having seen a couple recent reports of "could not access status of
transaction" for old, not-obviously-corrupt transaction numbers, I went
looking to see if I could find a way that the system could truncate CLOG
before it's really marked all occurrences of old transaction numbers as
known-dead or known-good.

I found one.

The problem is that there are several places where a tqual.c routine is
called without checking to see if it changed the tuple's commit hint
bits, and without necessarily writing the page immediately after. One
example is the code path in heap_update where we decide that we can't
update the tuple because a concurrent transaction did so. If
HeapTupleSatisfiesUpdate had set the XMIN_COMMITTED or XMAX_COMMITTED
bits, those bits would remain set in the shared buffer, but *the buffer
would not get marked dirty*.

Before PG 7.2 this was not a bug, because the hint bits could always be
set again later. But now, consider this scenario: while the buffer
remains in memory, VACUUM passes over the table. It doesn't find any
changes needed in that page, so it doesn't write the page either. At
completion of the vacuum, we check whether we can truncate CLOG,
discover we can, and do so. At some later point, the in-memory buffer
is discarded, still without having been written. When next read in,
the page contains an un-hinted transaction status that could easily
point to a transaction before the new CLOG boundary. Ooops.

The odds of such a problem seem exceedingly small ... in other words,
just about right to explain the small numbers of reports we get.

I think what we ought to do to solve this problem permanently is to stop
making the callers of the HeapTupleSatisfiesFoo() routines responsible
for checking for hint bit updates. It would be a lot safer, and AFAICS
not noticeably less efficient, for those routines to call
SetBufferCommitInfoNeedsSave for themselves. This would require adding
to their parameter lists, because they aren't currently told which
buffer the tuple is in, but that's no big deal considering we get to
simplify the calling logic in all the places that are faithfully doing
the t_infomask update check.

Comments?

regards, tom lane


From: "Michael Paesold" <mpaesold(at)gmx(dot)at>
To: <pgsql-hackers(at)postgreSQL(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Why we still see some reports of "could not access transaction status"
Date: 2004-10-14 06:17:41
Message-ID: 007b01c4b1b5$83b30140$ad01a8c0@zaphod
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> Having seen a couple recent reports of "could not access status of
> transaction" for old, not-obviously-corrupt transaction numbers, I went
> looking to see if I could find a way that the system could truncate CLOG
> before it's really marked all occurrences of old transaction numbers as
> known-dead or known-good.
>
> I found one.

I was starting to wonder about those reports, too. Actually I was thinking
about bringing this up as soon as I would find time. So I am glad you picked
that up yourself -- and found a problem already.

> I think what we ought to do to solve this problem permanently is to stop
...
>
> Comments?

Well, I am not able to comment here, but I can say I usually trust your
judgement.

Best Regards,
Michael Paesold


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why we still see some reports of "could not access transaction status"
Date: 2004-10-14 13:00:46
Message-ID: 20041014130045.GA4174@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 13, 2004 at 12:18:08PM -0400, Tom Lane wrote:

> I think what we ought to do to solve this problem permanently is to stop
> making the callers of the HeapTupleSatisfiesFoo() routines responsible
> for checking for hint bit updates. It would be a lot safer, and AFAICS
> not noticeably less efficient, for those routines to call
> SetBufferCommitInfoNeedsSave for themselves. This would require adding
> to their parameter lists, because they aren't currently told which
> buffer the tuple is in, but that's no big deal considering we get to
> simplify the calling logic in all the places that are faithfully doing
> the t_infomask update check.
>
> Comments?

I remember seeing this code when coding the phantom Xid idea and
wondering why such an error-prone style was used. It never ocurred to
me to change it (or maybe have the guts to do it), but now that you
mention it it certainly seems a good idea.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Tulio: oh, para qué servirá este boton, Juan Carlos?
Policarpo: No, aléjense, no toquen la consola!
Juan Carlos: Lo apretaré una y otra vez.


From: Gaetano Mendola <mendola(at)bigfoot(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why we still see some reports of "could not access transaction
Date: 2004-10-15 16:39:36
Message-ID: ckoug8$6io$1@floppy.pyrenet.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Having seen a couple recent reports of "could not access status of
> transaction" for old, not-obviously-corrupt transaction numbers, I went
> looking to see if I could find a way that the system could truncate CLOG
> before it's really marked all occurrences of old transaction numbers as
> known-dead or known-good.
>
> I found one.
>

Are you going to fix it for the 8.0 and/or back patch it ?

Regards
Gaetano Mendola


From: Neil Conway <neilc(at)samurai(dot)com>
To: Gaetano Mendola <mendola(at)bigfoot(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why we still see some reports of "could not access
Date: 2004-10-16 13:45:12
Message-ID: 417125E8.2060802@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gaetano Mendola wrote:
> Are you going to fix it for the 8.0 and/or back patch it ?

http://archives.postgresql.org/pgsql-committers/2004-10/msg00229.php
http://archives.postgresql.org/pgsql-committers/2004-10/msg00191.php

plus backpatches to older branches (REL7_3_STABLE, REL7_2_STABLE).

Has there been any thought about putting out another 7.4 release with
this fix?

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Gaetano Mendola <mendola(at)bigfoot(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Why we still see some reports of "could not access
Date: 2004-10-16 16:20:19
Message-ID: 4265.1097943619@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Has there been any thought about putting out another 7.4 release with
> this fix?

There has, but there are some other open issues I'd like to deal with
first.

If anyone has any pending 7.4 fixes, getting them in in the next
few days would be a Good Plan.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4 changes
Date: 2004-10-17 12:25:51
Message-ID: 417264CF.9080304@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>
>If anyone has any pending 7.4 fixes, getting them in in the next
>few days would be a Good Plan.
>
>
>

Do we want to backport tighter security for plperl? In particular,
insisting on Safe.pm >= 2.09 and removing the :base_io set of ops?

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4 changes
Date: 2004-10-17 13:08:54
Message-ID: 41726EE6.80502@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:

>
>
> Tom Lane wrote:
>
>>
>> If anyone has any pending 7.4 fixes, getting them in in the next
>> few days would be a Good Plan.
>>
>>
>>
>
>
> Do we want to backport tighter security for plperl? In particular,
> insisting on Safe.pm >= 2.09 and removing the :base_io set of ops?
>
>

And it would also be nice if we could add
contrib/cube/expected/cube_1.out to the 7.4 branch, I think, so that
more platforms could pass the contrib installcheck tests.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4 changes
Date: 2004-10-17 17:52:52
Message-ID: 9402.1098035572@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Do we want to backport tighter security for plperl? In particular,
> insisting on Safe.pm >= 2.09 and removing the :base_io set of ops?

I'd vote not: 7.4.5 => 7.4.6 is not an update that people would expect
to break their plperl code ...

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 7.4 changes
Date: 2004-10-18 16:45:34
Message-ID: 4173F32E.2030109@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>
>>Do we want to backport tighter security for plperl? In particular,
>>insisting on Safe.pm >= 2.09 and removing the :base_io set of ops?
>>
>>
>
>I'd vote not: 7.4.5 => 7.4.6 is not an update that people would expect
>to break their plperl code ...
>
>
>
>

*shrug* OK. Then plperl should probably not be regarded as being as
"trusted" as we would like. Note that old versions of Safe.pm have been
the subject of security advisories such as this one
http://www.securityfocus.com/bid/6111/info/ for some time.

cheers

andrew


From: Neil Conway <neilc(at)samurai(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 7.4 changes
Date: 2004-10-19 07:00:58
Message-ID: 1098169257.1113.301.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2004-10-19 at 02:45, Andrew Dunstan wrote:
> *shrug* OK. Then plperl should probably not be regarded as being as
> "trusted" as we would like. Note that old versions of Safe.pm have been
> the subject of security advisories such as this one
> http://www.securityfocus.com/bid/6111/info/ for some time.

Perhaps a compromise would be to require the newer version of Safe.pm,
but leave the other changes for 8.0. Upgrading Safe.pm can presumably be
done without needing any changes to the rest of one's pl/perl code.

-Neil


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 7.4 changes
Date: 2004-10-19 12:47:20
Message-ID: 41750CD8.6070300@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:

>On Tue, 2004-10-19 at 02:45, Andrew Dunstan wrote:
>
>
>>*shrug* OK. Then plperl should probably not be regarded as being as
>>"trusted" as we would like. Note that old versions of Safe.pm have been
>>the subject of security advisories such as this one
>>http://www.securityfocus.com/bid/6111/info/ for some time.
>>
>>
>
>Perhaps a compromise would be to require the newer version of Safe.pm,
>but leave the other changes for 8.0. Upgrading Safe.pm can presumably be
>done without needing any changes to the rest of one's pl/perl code.
>
>
>
>

s/the rest of/any of/

Indeed it can.

The other thing I suggested was removing the :base_io set of ops - I
would regard plperl functions that did things like printing to STDOUT as
broken to start with.

But maybe we can just live with what we have and advertise that 8.0's
plperl is more secure.

cheers

andrew


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 7.4 changes
Date: 2004-10-19 13:02:13
Message-ID: 20041019130213.GE4134@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 19, 2004 at 08:47:20AM -0400, Andrew Dunstan wrote:

> But maybe we can just live with what we have and advertise that 8.0's
> plperl is more secure.

The release notes should point out that 7.4's plperl is unsecure unless
the correct version of Safe.pm is installed. Maybe it works to make it
croak if an unsafe version of Safe.pm is found?

I'm not sure about "living with" known security vulnerabilities. What
about ISPs which give Pg hosting with plperl installed? They surely
will want to know about this.

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
One man's impedance mismatch is another man's layer of abstraction.
(Lincoln Yeoh)