Re: problems with table corruption continued

Lists: pgsql-hackers
From: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Brian Hirt <bhirt(at)mobygames(dot)com>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>, Brian A Hirt <bhirt(at)berkhirt(dot)com>
Subject: Re: problems with table corruption continued
Date: 2001-12-18 19:25:08
Message-ID: 3705826352029646A3E91C53F7189E32518451@sectorbase2.sectorbase.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> This is trying to get rid of the original copy of a tuple that's been
> moved to another page. The problem is that your index
> function causes a
> table scan, which means that by the time control gets here,
> someone else
> has looked at this tuple and marked it good --- so the initial test of
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> HEAP_XMIN_COMMITTED fails, and the tuple is never removed!
>
> I would say that it's incorrect for vacuum.c to assume that
> HEAP_XMIN_COMMITTED can't become set on HEAP_MOVED_OFF/HEAP_MOVED_IN
> tuples during the course of vacuum's processing; after all, the xmin
> definitely does refer to a committed xact, and we can't realistically
> assume that we know what processing will be induced by user-defined
> index functions. Vadim, what do you think? How should we fix this?

But it's incorrect for table scan to mark tuple as good neither.
Looks like we have to add checks for the case
TransactionIdIsCurrentTransactionId(tuple->t_cmin) when
there is HEAP_MOVED_OFF or HEAP_MOVED_IN in t_infomask to
all HeapTupleSatisfies* in tqual.c as we do in
HeapTupleSatisfiesDirty - note comments about uniq btree-s there.

Vadim


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>
Cc: Brian Hirt <bhirt(at)mobygames(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>, Brian A Hirt <bhirt(at)berkhirt(dot)com>
Subject: Re: problems with table corruption continued
Date: 2001-12-18 19:36:11
Message-ID: 17925.1008704171@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM> writes:
>> I would say that it's incorrect for vacuum.c to assume that
>> HEAP_XMIN_COMMITTED can't become set on HEAP_MOVED_OFF/HEAP_MOVED_IN
>> tuples during the course of vacuum's processing; after all, the xmin
>> definitely does refer to a committed xact, and we can't realistically
>> assume that we know what processing will be induced by user-defined
>> index functions. Vadim, what do you think? How should we fix this?

> But it's incorrect for table scan to mark tuple as good neither.

Oh, that makes sense.

> Looks like we have to add checks for the case
> TransactionIdIsCurrentTransactionId(tuple->t_cmin) when
> there is HEAP_MOVED_OFF or HEAP_MOVED_IN in t_infomask to
> all HeapTupleSatisfies* in tqual.c as we do in
> HeapTupleSatisfiesDirty - note comments about uniq btree-s there.

Sounds like a plan. Do you want to work on this, or shall I?

regards, tom lane


From: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>
Cc: "Brian Hirt" <bhirt(at)mobygames(dot)com>, "Postgres Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Brian A Hirt" <bhirt(at)berkhirt(dot)com>
Subject: Re: problems with table corruption continued
Date: 2001-12-18 21:39:34
Message-ID: EKEJJICOHDIEMGPNIFIJCEPMGDAA.Inoue@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: Tom Lane
>
> "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM> writes:
> >> I would say that it's incorrect for vacuum.c to assume that
> >> HEAP_XMIN_COMMITTED can't become set on HEAP_MOVED_OFF/HEAP_MOVED_IN
> >> tuples during the course of vacuum's processing; after all, the xmin
> >> definitely does refer to a committed xact, and we can't realistically
> >> assume that we know what processing will be induced by user-defined
> >> index functions. Vadim, what do you think? How should we fix this?
>
> > But it's incorrect for table scan to mark tuple as good neither.
>
> Oh, that makes sense.
>
> > Looks like we have to add checks for the case
> > TransactionIdIsCurrentTransactionId(tuple->t_cmin) when
> > there is HEAP_MOVED_OFF or HEAP_MOVED_IN in t_infomask to
> > all HeapTupleSatisfies* in tqual.c as we do in
> > HeapTupleSatisfiesDirty - note comments about uniq btree-s there.

Should the change be TransactionIdIsInProgress(tuple->t_cmin) ?

The cause of Brian's issue was exactly what I was afraid of.
VACUUM is guarded by AccessExclusive lock but IMHO we
shouldn't rely on it too heavily.

regards,
Hiroshi Inoue


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp>
Cc: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>, "Brian Hirt" <bhirt(at)mobygames(dot)com>, "Postgres Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Brian A Hirt" <bhirt(at)berkhirt(dot)com>
Subject: Re: problems with table corruption continued
Date: 2001-12-18 23:01:13
Message-ID: 19188.1008716473@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> Should the change be TransactionIdIsInProgress(tuple->t_cmin) ?

I'd be willing to do
if (tuple->t_cmin is my XID)
do something;
Assert(!TransactionIdIsInProgress(tuple->t_cmin));
if that makes you feel better. But anything that's scanning
a table exclusive-locked by another transaction is broken.
(BTW: contrib/pgstattuple is thus broken. Will fix.)

regards, tom lane


From: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>, Brian Hirt <bhirt(at)mobygames(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>, Brian A Hirt <bhirt(at)berkhirt(dot)com>
Subject: Re: problems with table corruption continued
Date: 2001-12-19 00:19:54
Message-ID: 3C1FDD2A.EC4A8B51@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> "Hiroshi Inoue" <Inoue(at)tpf(dot)co(dot)jp> writes:
> > Should the change be TransactionIdIsInProgress(tuple->t_cmin) ?
>
> I'd be willing to do
> if (tuple->t_cmin is my XID)
> do something;
> Assert(!TransactionIdIsInProgress(tuple->t_cmin));
> if that makes you feel better.

It's useless in hard to reproduce cases.
I've thought but given up many times to propose this
change and my decision seems to have been right because
I see only *Assert* even after this issue.

> But anything that's scanning
> a table exclusive-locked by another transaction is broken.
> (BTW: contrib/pgstattuple is thus broken. Will fix.)

It seems dangerous to me to rely only on developers'
carefulness. There are some places where relations
are opened with NoLock. We had better change them.
I once examined them but AFAIR there are few cases
when they are really opened with NoLock. In most
cases they are already opened previously with other
lock modes. I don't remember well if there was a
really dangerous(scan an relation opened with NoLock)
place.

regards,
Hiroshi Inoue


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
Cc: "Mikheev, Vadim" <vmikheev(at)SECTORBASE(dot)COM>, Brian Hirt <bhirt(at)mobygames(dot)com>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>, Brian A Hirt <bhirt(at)berkhirt(dot)com>
Subject: Re: problems with table corruption continued
Date: 2001-12-19 00:29:23
Message-ID: 19687.1008721763@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp> writes:
> I don't remember well if there was a really dangerous(scan an relation
> opened with NoLock) place.

I have just looked for that, and the only such place is in the new
contrib module pgstattuple. This is clearly broken, since there is
nothing stopping someone from (eg) dropping the relation while
pgsstattuple is scanning it. I think it should get AccessShareLock
on the relation.

The ri_triggers code has a lot of places that open things NoLock,
but it only looks into the relcache entry and doesn't try to scan
the relation. Nonetheless that code bothers me; we could be using
an obsolete relcache entry if someone has just committed an ALTER
TABLE on the relation. Some of the cases may be safe because a lock
is held higher up (eg, on the table from which the trigger was fired)
but I doubt they all are.

regards, tom lane


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: problems with table corruption continued
Date: 2001-12-19 01:39:33
Message-ID: 20011218173431.C65195-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 18 Dec 2001, Tom Lane wrote:

> The ri_triggers code has a lot of places that open things NoLock,
> but it only looks into the relcache entry and doesn't try to scan
> the relation. Nonetheless that code bothers me; we could be using
> an obsolete relcache entry if someone has just committed an ALTER
> TABLE on the relation. Some of the cases may be safe because a lock
> is held higher up (eg, on the table from which the trigger was fired)
> but I doubt they all are.

Probably not, since it looks like that's being done for the other table of
the constraint (not the one on which the trigger was fired).


From: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
To: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: problems with table corruption continued
Date: 2001-12-19 03:56:40
Message-ID: 3C200FF8.6D344DC@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephan Szabo wrote:
>
> On Tue, 18 Dec 2001, Tom Lane wrote:
>
> > The ri_triggers code has a lot of places that open things NoLock,
> > but it only looks into the relcache entry and doesn't try to scan
> > the relation. Nonetheless that code bothers me; we could be using
> > an obsolete relcache entry if someone has just committed an ALTER
> > TABLE on the relation. Some of the cases may be safe because a lock
> > is held higher up (eg, on the table from which the trigger was fired)
> > but I doubt they all are.
>
> Probably not, since it looks like that's being done for the other table of
> the constraint (not the one on which the trigger was fired).

If a lock is held already, acquiring an AccessShareLock
would cause no addtional conflict. I don't see any reason
to walk a tightrope with NoLock intentionally.

regards,
Hiroshi Inoue


From: Stephan Szabo <sszabo(at)megazone23(dot)bigpanda(dot)com>
To: Hiroshi Inoue <Inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Postgres Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: problems with table corruption continued
Date: 2001-12-19 04:29:24
Message-ID: 20011218202808.Y66402-100000@megazone23.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Wed, 19 Dec 2001, Hiroshi Inoue wrote:

> Stephan Szabo wrote:
> >
> > On Tue, 18 Dec 2001, Tom Lane wrote:
> >
> > > The ri_triggers code has a lot of places that open things NoLock,
> > > but it only looks into the relcache entry and doesn't try to scan
> > > the relation. Nonetheless that code bothers me; we could be using
> > > an obsolete relcache entry if someone has just committed an ALTER
> > > TABLE on the relation. Some of the cases may be safe because a lock
> > > is held higher up (eg, on the table from which the trigger was fired)
> > > but I doubt they all are.
> >
> > Probably not, since it looks like that's being done for the other table of
> > the constraint (not the one on which the trigger was fired).
>
> If a lock is held already, acquiring an AccessShareLock
> would cause no addtional conflict. I don't see any reason
> to walk a tightrope with NoLock intentionally.

I don't know why NoLock was used there, I was just pointing out that the
odds of a lock being held higher up is probably low.