Re: Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin

Lists: pgsql-hackers
From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin
Date: 2014-12-30 23:51:05
Message-ID: CAB7nPqSqvKYw5_UHnWBj7BwKjk_FpQXEDJEZ+OwNcvDB_SuRTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

HI all.

markhor has run for the first time in 8 days, and there is something
in range e703261..72dd233 making the regression test of brin crashing.
See here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2014-12-30%2020%3A58%3A49
Regards,
--
Michael


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin
Date: 2014-12-31 01:39:17
Message-ID: 20141231013917.GW1645@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier wrote:
> HI all.
>
> markhor has run for the first time in 8 days, and there is something
> in range e703261..72dd233 making the regression test of brin crashing.
> See here:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2014-12-30%2020%3A58%3A49

This shows that the crash was in the object_address test, not brin.
Will research.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin
Date: 2014-12-31 13:02:40
Message-ID: 20141231130240.GA1457@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Michael Paquier wrote:
> > HI all.
> >
> > markhor has run for the first time in 8 days, and there is something
> > in range e703261..72dd233 making the regression test of brin crashing.
> > See here:
> > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2014-12-30%2020%3A58%3A49
>
> This shows that the crash was in the object_address test, not brin.
> Will research.

I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
the object_address test. The backtrace is pretty strange:

#0 0x00007f08ce674107 in __GI_raise (sig=sig(at)entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f08ce6754e8 in __GI_abort () at abort.c:89
#2 0x00000000007ac071 in ExceptionalCondition (
conditionName=conditionName(at)entry=0x800f28 "!(keylen < 64)",
errorType=errorType(at)entry=0x7e724f "FailedAssertion",
fileName=fileName(at)entry=0x800ef0 "/pgsql/source/master/src/backend/access/hash/hashfunc.c", lineNumber=lineNumber(at)entry=147)
at /pgsql/source/master/src/backend/utils/error/assert.c:54
#3 0x0000000000494a93 in hashname (fcinfo=fcinfo(at)entry=0x7fff244324a0)
at /pgsql/source/master/src/backend/access/hash/hashfunc.c:147
#4 0x00000000007b450d in DirectFunctionCall1Coll (func=0x494a50 <hashname>,
collation=collation(at)entry=0, arg1=<optimized out>)
at /pgsql/source/master/src/backend/utils/fmgr/fmgr.c:1027
#5 0x0000000000797aca in CatalogCacheComputeHashValue (cache=cache(at)entry=0x10367d8,
nkeys=<optimized out>, cur_skey=cur_skey(at)entry=0x7fff244328e0)
at /pgsql/source/master/src/backend/utils/cache/catcache.c:212
#6 0x0000000000798ff7 in SearchCatCache (cache=0x10367d8, v1=18241016, v2=6, v3=11, v4=0)
at /pgsql/source/master/src/backend/utils/cache/catcache.c:1149
#7 0x00000000007a67ae in GetSysCacheOid (cacheId=cacheId(at)entry=15, key1=<optimized out>,
key2=key2(at)entry=6, key3=key3(at)entry=11, key4=key4(at)entry=0)
at /pgsql/source/master/src/backend/utils/cache/syscache.c:988
#8 0x0000000000504699 in get_collation_oid (name=name(at)entry=0x11655c0,
missing_ok=missing_ok(at)entry=0 '\000')
at /pgsql/source/master/src/backend/catalog/namespace.c:3323
#9 0x000000000050d8dc in get_object_address (objtype=objtype(at)entry=OBJECT_COLLATION,
objname=objname(at)entry=0x11655c0, objargs=objargs(at)entry=0x0,
relp=relp(at)entry=0x7fff24432c28, lockmode=lockmode(at)entry=1,
missing_ok=missing_ok(at)entry=0 '\000')
at /pgsql/source/master/src/backend/catalog/objectaddress.c:704

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin
Date: 2014-12-31 14:37:26
Message-ID: 20141231143726.GD19836@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-12-31 10:02:40 -0300, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
> > Michael Paquier wrote:
> > > HI all.
> > >
> > > markhor has run for the first time in 8 days, and there is something
> > > in range e703261..72dd233 making the regression test of brin crashing.
> > > See here:
> > > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=markhor&dt=2014-12-30%2020%3A58%3A49
> >
> > This shows that the crash was in the object_address test, not brin.
> > Will research.
>
> I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
> the object_address test. The backtrace is pretty strange:

Hard to say without more detail, but my guess is that the argument to
get_collation_oid() isn't actually valid. For one, that'd explain the
error, for another, the pointer's value (name=name(at)entry=0x11655c0) is
suspiciously low.

> #8 0x0000000000504699 in get_collation_oid (name=name(at)entry=0x11655c0,
> missing_ok=missing_ok(at)entry=0 '\000')
> at /pgsql/source/master/src/backend/catalog/namespace.c:3323
> #9 0x000000000050d8dc in get_object_address (objtype=objtype(at)entry=OBJECT_COLLATION,
> objname=objname(at)entry=0x11655c0, objargs=objargs(at)entry=0x0,
> relp=relp(at)entry=0x7fff24432c28, lockmode=lockmode(at)entry=1,
> missing_ok=missing_ok(at)entry=0 '\000')
> at /pgsql/source/master/src/backend/catalog/objectaddress.c:704

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin
Date: 2014-12-31 15:01:19
Message-ID: 6347.1420038079@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-12-31 10:02:40 -0300, Alvaro Herrera wrote:
>> I can reproduce the crash in a CLOBBER_CACHE_ALWAYS build in
>> the object_address test. The backtrace is pretty strange:

> Hard to say without more detail, but my guess is that the argument to
> get_collation_oid() isn't actually valid. For one, that'd explain the
> error, for another, the pointer's value (name=name(at)entry=0x11655c0) is
> suspiciously low.

Given that CLOBBER_CACHE_ALWAYS seems to make it fail reliably, the
obvious explanation is that what's being passed is a pointer into
catcache or relcache storage that isn't guaranteed to be valid for
long enough. The given backtrace doesn't go down far enough to show
where the bogus input came from, but I'm betting that something is
returning to SQL a string it got from cache without pstrdup'ing it.

regards, tom lane


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failure on markhor with CLOBBER_CACHE_ALWAYS for test brin
Date: 2014-12-31 16:21:55
Message-ID: 20141231162155.GC1457@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> Given that CLOBBER_CACHE_ALWAYS seems to make it fail reliably, the
> obvious explanation is that what's being passed is a pointer into
> catcache or relcache storage that isn't guaranteed to be valid for
> long enough. The given backtrace doesn't go down far enough to show
> where the bogus input came from, but I'm betting that something is
> returning to SQL a string it got from cache without pstrdup'ing it.

Yep, that was it -- the bug was in getObjectIdentityParts. I noticed
other three cases of missing pstrdup(), also fixed.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services