Re: SSI bug?

Lists: pgsql-hackers
From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: pgsql-hackers(at)postgresql(dot)org
Subject: SSI bug?
Date: 2011-02-10 08:48:26
Message-ID: 20110210084826.B83FA19CE68@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

it seems that PredicateLockTupleRowVersionLink sometimes create
a loop of targets (it founds an existing 'newtarget' whose nextVersionOfRow
chain points to the 'oldtarget') and it later causes
CheckTargetForConflictsIn loop forever.

YAMAMOTO Takashi


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-02-11 15:48:18
Message-ID: 4D5505E2020000250003A83C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:

> it seems that PredicateLockTupleRowVersionLink sometimes create
> a loop of targets (it founds an existing 'newtarget' whose
> nextVersionOfRow chain points to the 'oldtarget') and it later
> causes CheckTargetForConflictsIn loop forever.

Is this a hypothetical risk based on looking at the code, or have
you seen this actually happen? Either way, could you provide more
details? (A reproducible test case would be ideal.)

This being the newest part of the code, I'll grant that it is the
most likely to have an unidentified bug; but given that the pointers
are from one predicate lock target structure identified by a tuple
ID to one identified by the tuple ID of the next version of the row,
it isn't obvious to me how a cycle could develop.

-Kevin


From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-12 07:08:21
Message-ID: 20110212070822.46BE319D15D@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

> YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:
>
>> it seems that PredicateLockTupleRowVersionLink sometimes create
>> a loop of targets (it founds an existing 'newtarget' whose
>> nextVersionOfRow chain points to the 'oldtarget') and it later
>> causes CheckTargetForConflictsIn loop forever.
>
> Is this a hypothetical risk based on looking at the code, or have
> you seen this actually happen? Either way, could you provide more
> details? (A reproducible test case would be ideal.)

i have seen this actually happen. i've confirmed the creation of the loop
with the attached patch. it's easily reproducable with my application.
i can provide the full source code of my application if you want.
(but it isn't easy to run unless you are familiar with the recent
version of NetBSD)
i haven't found a smaller reproducible test case yet.

YAMAMOTO Takashi

>
> This being the newest part of the code, I'll grant that it is the
> most likely to have an unidentified bug; but given that the pointers
> are from one predicate lock target structure identified by a tuple
> ID to one identified by the tuple ID of the next version of the row,
> it isn't obvious to me how a cycle could develop.
>
> -Kevin

Attachment Content-Type Size
a.diff text/plain 945 bytes

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-02-12 17:07:52
Message-ID: 4D566A08020000250003A90B@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:

> i have seen this actually happen. i've confirmed the creation of
> the loop with the attached patch. it's easily reproducable with
> my application. i can provide the full source code of my
> application if you want. (but it isn't easy to run unless you are
> familiar with the recent version of NetBSD)
> i haven't found a smaller reproducible test case yet.

I've never used NetBSD, so maybe a few details will help point me in
the right direction faster than the source code.

Has your application ever triggered any of the assertions in the
code? (In particular, it would be interesting if it ever hit the
one right above where you patched.)

How long was the loop?

Did you notice whether the loop involved multiple tuples within a
single page?

Did this coincide with an autovacuum of the table?

These last two are of interest because it seems likely that such a
cycle might be related to this new code not properly allowing for
some aspect of tuple cleanup.

Thanks for finding this and reporting it, and thanks in advance for
any further detail you can provide.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-02-12 17:53:57
Message-ID: 4D5674D5020000250003A911@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:

> it seems likely that such a cycle might be related to this new
> code not properly allowing for some aspect of tuple cleanup.

I found a couple places where cleanup could let these fall through
the cracks long enough to get stale and still be around when a tuple
ID is re-used, causing problems. Please try the attached patch and
see if it fixes the problem for you.

If it does, then there's no need to try to track the other things I
was asking about.

Thanks!

-Kevin

Attachment Content-Type Size
ssi-cleanup-fix.patch text/plain 2.4 KB

From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-14 03:08:24
Message-ID: 20110214030824.B545C19D6D7@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

> I wrote:
>
>> it seems likely that such a cycle might be related to this new
>> code not properly allowing for some aspect of tuple cleanup.
>
> I found a couple places where cleanup could let these fall through
> the cracks long enough to get stale and still be around when a tuple
> ID is re-used, causing problems. Please try the attached patch and
> see if it fixes the problem for you.
>
> If it does, then there's no need to try to track the other things I
> was asking about.

thanks. unfortunately the problem still happens with the patch.

YAMAMOTO Takashi

>
> Thanks!
>
> -Kevin


From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-14 05:09:23
Message-ID: 20110214050924.2E5E119CE7A@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

all of the following answers are with the patch you provided in
other mail applied.

> YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:
>
>> i have seen this actually happen. i've confirmed the creation of
>> the loop with the attached patch. it's easily reproducable with
>> my application. i can provide the full source code of my
>> application if you want. (but it isn't easy to run unless you are
>> familiar with the recent version of NetBSD)
>> i haven't found a smaller reproducible test case yet.
>
> I've never used NetBSD, so maybe a few details will help point me in
> the right direction faster than the source code.
>
> Has your application ever triggered any of the assertions in the
> code? (In particular, it would be interesting if it ever hit the
> one right above where you patched.)

the assertion right above is sometimes triggered. sometimes not.

>
> How long was the loop?

see below.

> Did you notice whether the loop involved multiple tuples within a
> single page?

if i understand correctly, yes.

the following is a snippet of my debug code (dump targets when
triggerCheckTargetForConflictsIn loops >1000 times) and its output.
the same locktag_field3 value means the same page, right?

+ for (t = target, i = 0; t != NULL; i++) {
+ elog(WARNING, "[%u] target %p tag %" PRIx32 ":%" PRIx32 ":%" PRIx32
+ ":%" PRIx16 ":%" PRIx16 " prior %p next %p", i, t,
+ t->tag.locktag_field1,
+ t->tag.locktag_field2,
+ t->tag.locktag_field3,
+ t->tag.locktag_field4,
+ t->tag.locktag_field5,
+ t->priorVersionOfRow,
+ t->nextVersionOfRow);
+ t = t->priorVersionOfRow;
+ if (t == target) {
+ elog(WARNING, "found a loop");
+ break;
+ }
+ }

WARNING: [0] target 0xbb51f238 tag 4000:4017:53b:6c:0 prior 0xbb51f350 next 0xbb51f350
WARNING: [1] target 0xbb51f350 tag 4000:4017:53b:69:0 prior 0xbb51f238 next 0xbb51f238
WARNING: found a loop

another sample:

WARNING: [0] target 0xbb51f530 tag 4000:4017:565:ae:0 prior 0xbb51f1e8 next 0xbb51f300
WARNING: [1] target 0xbb51f1e8 tag 4000:4017:565:ad:0 prior 0xbb51f580 next 0xbb51f530
WARNING: [2] target 0xbb51f580 tag 4000:4017:565:ac:0 prior 0xbb51f300 next 0xbb51f1e8
WARNING: [3] target 0xbb51f300 tag 4000:4017:565:ab:0 prior 0xbb51f530 next 0xbb51f580
WARNING: found a loop

the table seems mostly hot-updated, if it matters.

hoge=# select * from pg_stat_user_tables where relid=16407;
-[ RECORD 1 ]-----+--------------------
relid | 16407
schemaname | pgfs
relname | file
seq_scan | 0
seq_tup_read | 0
idx_scan | 53681
idx_tup_fetch | 52253
n_tup_ins | 569
n_tup_upd | 12054
n_tup_del | 476
n_tup_hot_upd | 12041
n_live_tup | 93
n_dead_tup | 559
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 4922528128875102208
autoanalyze_count | 7598807461784802080

(values in the last two columns seems bogus.
i don't know if it's related or not.)

> Did this coincide with an autovacuum of the table?

no.
(assuming that autovacuum=off in postgresql.conf is enough to exclude
the possibility.)

>
> These last two are of interest because it seems likely that such a
> cycle might be related to this new code not properly allowing for
> some aspect of tuple cleanup.
>
> Thanks for finding this and reporting it, and thanks in advance for
> any further detail you can provide.

thanks for looking.

YAMAMOTO Takashi

>
> -Kevin
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-14 10:40:37
Message-ID: 4D5906A5.8040602@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Looking at the prior/next version chaining, aside from the looping
issue, isn't it broken by lock promotion too? There's a check in
RemoveTargetIfNoLongerUsed() so that we don't release a lock target if
its priorVersionOfRow is set, but what if the tuple lock is promoted to
a page level lock first, and PredicateLockTupleRowVersionLink() is
called only after that? Or can that not happen because of something else
that I'm missing?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-02-14 18:10:49
Message-ID: 4D591BC9020000250003A98E@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Looking at the prior/next version chaining, aside from the
> looping issue, isn't it broken by lock promotion too? There's a
> check in RemoveTargetIfNoLongerUsed() so that we don't release a
> lock target if its priorVersionOfRow is set, but what if the tuple
> lock is promoted to a page level lock first, and
> PredicateLockTupleRowVersionLink() is called only after that? Or
> can that not happen because of something else that I'm missing?

I had to ponder that a while. Here's my thinking.

Predicate locks only matter when there is a write. Predicate locks
on heap tuples only matter when there is an UPDATE or DELETE of a
locked tuple. The problem these links are addressing is that an
intervening transaction might UPDATE the transaction between the
read of the tuple and a later UPDATE or DELETE. We want the second
UPDATE to see that it conflicts with a read from before the first
UPDATE. The first UPDATE creates the link from the "before" tuple
ID the "after" tuple ID at the target level. What predicate locks
exist on the second target are irrelevant when it comes to seeing
the conflict between the second UPDATE (or DELETE) and the initial
read. So I don't see where granularity promotion for locks on the
second target is a problem as long as the target itself doesn't get
deleted because of the link to the prior version of the tuple.

Promotion of the lock granularity on the prior tuple is where we
have problems. If the two tuple versions are in separate pages then
the second UPDATE could miss the conflict. My first thought was to
fix that by requiring promotion of a predicate lock on a tuple to
jump straight to the relation level if nextVersionOfRow is set for
the lock target and it points to a tuple in a different page. But
that doesn't cover a situation where we have a heap tuple predicate
lock which gets promoted to page granularity before the tuple is
updated. To handle that we would need to say that an UPDATE to a
tuple on a page which is predicate locked by the transaction would
need to be promoted to relation granularity if the new version of
the tuple wasn't on the same page as the old version.

That's all doable without too much trouble, but more than I'm
likely to get done today. It would be good if someone can confirm
my thinking on this first, too.

That said, the above is about eliminating false negatives from some
corner cases which escaped notice until now. I don't think the
changes described above will do anything to prevent the problems
reported by YAMAMOTO Takashi. Unless I'm missing something, it
sounds like tuple IDs are being changed or reused while predicate
locks are held on the tuples. That's probably not going to be
overwhelmingly hard to fix if we can identify how that can happen.
I tried to cover HOT issues, but it seems likely I missed something.
:-( I will be looking at it.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-02-14 19:04:41
Message-ID: 4D592869020000250003A99A@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:

>> Did you notice whether the loop involved multiple tuples within a
>> single page?
>
> if i understand correctly, yes.
>
> the following is a snippet of my debug code (dump targets when
> triggerCheckTargetForConflictsIn loops >1000 times) and its
> output.the same locktag_field3 value means the same page, right?

Right.

> the table seems mostly hot-updated, if it matters.

> idx_scan | 53681
> idx_tup_fetch | 52253
> n_tup_ins | 569
> n_tup_upd | 12054
> n_tup_del | 476
> n_tup_hot_upd | 12041
> n_live_tup | 93
> n_dead_tup | 559

That probably matters a lot.

> analyze_count | 4922528128875102208
> autoanalyze_count | 7598807461784802080
>
> (values in the last two columns seems bogus.
> i don't know if it's related or not.)

That seems unlikely to be related to this problem. It sure does
look odd, though. Maybe post that in a separate thread?

Thanks for all the additional info. I'll keep digging.

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-15 08:19:08
Message-ID: 4D5A36FC.6010203@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14.02.2011 20:10, Kevin Grittner wrote:
> Promotion of the lock granularity on the prior tuple is where we
> have problems. If the two tuple versions are in separate pages then
> the second UPDATE could miss the conflict. My first thought was to
> fix that by requiring promotion of a predicate lock on a tuple to
> jump straight to the relation level if nextVersionOfRow is set for
> the lock target and it points to a tuple in a different page. But
> that doesn't cover a situation where we have a heap tuple predicate
> lock which gets promoted to page granularity before the tuple is
> updated. To handle that we would need to say that an UPDATE to a
> tuple on a page which is predicate locked by the transaction would
> need to be promoted to relation granularity if the new version of
> the tuple wasn't on the same page as the old version.

Yeah, promoting the original lock on the UPDATE was my first thought too.

Another idea is to duplicate the original predicate lock on the first
update, so that the original reader holds a lock on both row versions. I
think that would ultimately be simpler as we wouldn't need the
next-prior chains anymore.

For example, suppose that transaction X is holding a predicate lock on
tuple A. Transaction Y updates tuple A, creating a new tuple B.
Transaction Y sees that X holds a lock on tuple A (or the page
containing A), so it acquires a new predicate lock on tuple B on behalf
of X.

If the updater aborts, the lock on the new tuple needs to be cleaned up,
so that it doesn't get confused with later tuple that's stored in the
same physical location. We could store the xmin of the tuple in the
predicate lock to check for that. Whenever you check for conflict, if
the xmin of the lock doesn't match the xmin on the tuple, you know that
the lock belonged to an old dead tuple stored in the same location, and
can be simply removed as the tuple doesn't exist anymore.

> That said, the above is about eliminating false negatives from some
> corner cases which escaped notice until now. I don't think the
> changes described above will do anything to prevent the problems
> reported by YAMAMOTO Takashi.

Agreed, it's a separate issue. Although if we change the way we handle
the read-update-update problem, the other issue might go away too.

> Unless I'm missing something, it
> sounds like tuple IDs are being changed or reused while predicate
> locks are held on the tuples. That's probably not going to be
> overwhelmingly hard to fix if we can identify how that can happen.
> I tried to cover HOT issues, but it seems likely I missed something.

Storing the xmin of the original tuple would probably help with that
too. But it would be nice to understand and be able to reproduce the
issue first.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-16 22:13:35
Message-ID: 20110216221335.C9A2119CF21@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

might be unrelated to the loop problem, but...

i got the following SEGV when runnning vacuum on a table.
(the line numbers in predicate.c is different as i have local modifications.)
oldlocktag.myTarget was NULL.
it seems that TransferPredicateLocksToNewTarget sometimes use stack garbage
for newpredlocktag.myTarget. vacuum on the table succeeded with the attached
patch. the latter part of the patch was necessary to avoid targetList
corruption, which later seems to make DeleteChildTargetLocks loop inifinitely.

YAMAMOTO Takashi

#0 0x0823cf6c in PredicateLockAcquire (targettag=0xbfbfa734)
at predicate.c:1835
#1 0x0823f18a in PredicateLockPage (relation=0x99b4dcf0, blkno=1259)
at predicate.c:2206
#2 0x080ac978 in _bt_search (rel=0x99b4dcf0, keysz=2, scankey=0x99a05040,
nextkey=0 '\0', bufP=0xbfbfa894, access=1) at nbtsearch.c:97
#3 0x080a996d in _bt_pagedel (rel=0x99b4dcf0, buf=<value optimized out>,
stack=0x0) at nbtpage.c:1059
#4 0x080aacc2 in btvacuumscan (info=0xbfbfbcc4, stats=0x99a01328,
callback=0x8184d50 <lazy_tid_reaped>, callback_state=0x99a012e0,
cycleid=13675) at nbtree.c:981
#5 0x080ab15c in btbulkdelete (fcinfo=0xbfbfb9e0) at nbtree.c:573
#6 0x082fde74 in FunctionCall4 (flinfo=0x99b86958, arg1=3217013956, arg2=0,
arg3=135810384, arg4=2577404640) at fmgr.c:1437
#7 0x080a4fd0 in index_bulk_delete (info=0xbfbfbcc4, stats=0x0,
callback=0x8184d50 <lazy_tid_reaped>, callback_state=0x99a012e0)
at indexam.c:738
#8 0x08184cd4 in lazy_vacuum_index (indrel=0x99b4dcf0, stats=0x99a023e0,
vacrelstats=0x99a012e0) at vacuumlazy.c:938
#9 0x081854b6 in lazy_vacuum_rel (onerel=0x99b47650, vacstmt=0x99b059d0,
bstrategy=0x99a07018, scanned_all=0xbfbfcfd8 "") at vacuumlazy.c:762
#10 0x08184265 in vacuum_rel (relid=16424, vacstmt=0x99b059d0,
do_toast=1 '\001', for_wraparound=0 '\0', scanned_all=0xbfbfcfd8 "")
at vacuum.c:978
#11 0x081845ea in vacuum (vacstmt=0x99b059d0, relid=0, do_toast=1 '\001',
bstrategy=0x0, for_wraparound=0 '\0', isTopLevel=1 '\001') at vacuum.c:230
#12 0xbbab50c3 in pgss_ProcessUtility (parsetree=0x99b059d0,
queryString=0x99b05018 "vacuum (verbose,analyze) pgfs.dirent;",
params=0x0, isTopLevel=1 '\001', dest=0x99b05b80,
completionTag=0xbfbfd21a "") at pg_stat_statements.c:603
#13 0x082499ea in PortalRunUtility (portal=0x99b33018, utilityStmt=0x99b059d0,
isTopLevel=1 '\001', dest=0x99b05b80, completionTag=0xbfbfd21a "")
at pquery.c:1191
#14 0x0824a79e in PortalRunMulti (portal=0x99b33018, isTopLevel=4 '\004',
dest=0x99b05b80, altdest=0x99b05b80, completionTag=0xbfbfd21a "")
at pquery.c:1298
#15 0x0824b21a in PortalRun (portal=0x99b33018, count=2147483647,
isTopLevel=1 '\001', dest=0x99b05b80, altdest=0x99b05b80,
completionTag=0xbfbfd21a "") at pquery.c:822
#16 0x08247dc7 in exec_simple_query (
query_string=0x99b05018 "vacuum (verbose,analyze) pgfs.dirent;")
at postgres.c:1059
#17 0x08248a79 in PostgresMain (argc=2, argv=0xbb912650,
username=0xbb9125c0 "takashi") at postgres.c:3943
#18 0x0820e231 in ServerLoop () at postmaster.c:3590
#19 0x0820ef88 in PostmasterMain (argc=3, argv=0xbfbfe59c) at postmaster.c:1110
#20 0x081b6439 in main (argc=3, argv=0xbfbfe59c) at main.c:199
(gdb) list
1830 offsetof(PREDICATELOCK, xactLink));
1831
1832 oldlocktag = predlock->tag;
1833 Assert(oldlocktag.myXact == sxact);
1834 oldtarget = oldlocktag.myTarget;
1835 oldtargettag = oldtarget->tag;
1836
1837 if (TargetTagIsCoveredBy(oldtargettag, *newtargettag))
1838 {
1839 uint32 oldtargettaghash;
(gdb)

Attachment Content-Type Size
a.diff text/plain 1.1 KB

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-02-16 23:11:49
Message-ID: 4D5C0555020000250003AB2F@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:

> might be unrelated to the loop problem, but...
>
> i got the following SEGV when runnning vacuum on a table.

> vacuum on the table succeeded with the attached patch.

Thanks! I appreciate the heavy testing and excellent diagnostics.
On the face of it, this doesn't look related to the other problem,
but I'll post again soon after closer review.

-Kevin


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: Kevin(dot)Grittner(at)wicourts(dot)gov, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-17 19:45:22
Message-ID: 20110217194522.GA98448@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 16, 2011 at 10:13:35PM +0000, YAMAMOTO Takashi wrote:
> i got the following SEGV when runnning vacuum on a table.
> (the line numbers in predicate.c is different as i have local modifications.)
> oldlocktag.myTarget was NULL.
> it seems that TransferPredicateLocksToNewTarget sometimes use stack garbage
> for newpredlocktag.myTarget. vacuum on the table succeeded with the attached
> patch. the latter part of the patch was necessary to avoid targetList
> corruption, which later seems to make DeleteChildTargetLocks loop inifinitely.

Oops. Those are both definitely bugs (and my fault). Your patch looks
correct. Thanks for catching that!

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-02-17 22:11:28
Message-ID: 4D5D48B0020000250003ACA2@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:

> Oops. Those are both definitely bugs (and my fault). Your patch
> looks correct. Thanks for catching that!

Could a committer please apply the slightly modified version here?:

http://archives.postgresql.org/message-id/4D5C46BB020000250003AB40@gw.wicourts.gov

It is a pretty straightforward bug fix to initialize some currently
uninitialized data which is causing occasional but severe problems,
especially during vacuum.

I'm still working on the other issues raised by YAMAMOTO Takashi and
Heikki. I expect to have a solution for those issues this weekend,
but this bug fix is needed regardless of how those issues are
settled.

-Kevin


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-02-18 11:05:50
Message-ID: AANLkTim3Ljfjemj1M5464L8WUyZG7ONEaCdrwTw-Fbqq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 17, 2011 at 23:11, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
>
>> Oops. Those are both definitely bugs (and my fault). Your patch
>> looks correct. Thanks for catching that!
>
> Could a committer please apply the slightly modified version here?:
>
> http://archives.postgresql.org/message-id/4D5C46BB020000250003AB40@gw.wicourts.gov
>
> It is a pretty straightforward bug fix to initialize some currently
> uninitialized data which is causing occasional but severe problems,
> especially during vacuum.

Done, thanks.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/