Re: 7.2.1 backend crash (convert_string_datum, locale)

Lists: pgsql-bugs
From: Mats Lofkvist <mal(at)algonet(dot)se>
To: pgsql-bugs(at)postgresql(dot)org
Subject: 7.2.1 backend crash (convert_string_datum, locale)
Date: 2002-07-11 15:29:12
Message-ID: y2q8z4ijnav.fsf@algonet.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


Hi,

When testing postgres 7.2.1 on a sparc/solaris8 box with
--enable-locale --enable-multibyte I get a crash in
convert_string_datum.

The backend just dies when doing an select. With casserts
and debug configured in I got the following in the log:

NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE: AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7818

Gdb on the crashing backend says:

Program received signal SIGSEGV, Segmentation fault.
0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
446 AssertArg(MemoryContextIsValid(header->context));
(gdb) where
#0 0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
#1 0x21844c in convert_string_datum (value=5251848, typid=1043)
at selfuncs.c:2059
#2 0x217978 in convert_to_scalar (value=4947304, valuetypid=1043,
scaledvalue=0xffbee0b8, lobound=5251848, hibound=4946632,
boundstypid=1043, scaledlobound=0xffbee0a8, scaledhibound=0xffbee0b0)
at selfuncs.c:1763
#3 0x214f8c in scalarineqsel (root=0x4aebe8, operator=1066, isgt=0 '\000',
var=0x4b6218, other=0x4b76d8) at selfuncs.c:584
#4 0x21541c in scalarltsel (fcinfo=0xffbee258) at selfuncs.c:733
#5 0x25aa90 in DirectFunctionCall4 (func=0x215304 <scalarltsel>,
arg1=4910056, arg2=1066, arg3=4947368, arg4=0) at fmgr.c:725
#6 0x2199f0 in prefix_selectivity (root=0x4aebe8, var=0x4b6218,
prefix=0x4b7ce8 "SY") at selfuncs.c:2667
#7 0x215854 in patternsel (fcinfo=0xffbee518, ptype=Pattern_Type_Like)
at selfuncs.c:872
#8 0x215a18 in likesel (fcinfo=0xffbee518) at selfuncs.c:913
#9 0x25c5e4 in OidFunctionCall4 (functionId=1819, arg1=4910056, arg2=1213,
arg3=4941064, arg4=1) at fmgr.c:1218
#10 0x185128 in restriction_selectivity (root=0x4aebe8, operator=1213,
args=0x4b6508, varRelid=1) at plancat.c:232
#11 0x167530 in clauselist_selectivity (root=0x4aebe8, clauses=0x4b7678,
varRelid=1) at clausesel.c:156
#12 0x167394 in restrictlist_selectivity (root=0x4aebe8,
restrictinfo_list=0x4b6958, varRelid=1) at clausesel.c:74
#13 0x16a044 in set_baserel_size_estimates (root=0x4aebe8, rel=0x4b6af8)
at costsize.c:1146
#14 0x166ae0 in set_plain_rel_pathlist (root=0x4aebe8, rel=0x4b6af8,
rte=0x4aec78) at allpaths.c:132
#15 0x166aa4 in set_base_rel_pathlists (root=0x4aebe8) at allpaths.c:115
#16 0x1667ec in make_one_rel (root=0x4aebe8) at allpaths.c:62
#17 0x177708 in subplanner (root=0x4aebe8, flat_tlist=0x4b6a18,
tuple_fraction=0) at planmain.c:238
#18 0x177544 in query_planner (root=0x4aebe8, tlist=0x4b5ed8, tuple_fraction=0)
at planmain.c:126
#19 0x17939c in grouping_planner (parse=0x4aebe8, tuple_fraction=0)
at planner.c:1094
#20 0x177d70 in subquery_planner (parse=0x4aebe8, tuple_fraction=-1)
at planner.c:228
#21 0x177a2c in planner (parse=0x4aebe8) at planner.c:94
#22 0x1c821c in pg_plan_query (querytree=0x4aebe8) at postgres.c:513
#23 0x1c871c in pg_exec_query_string (
query_string=0x4ae278 "SELECT find0.userId AS userId, find0.longValue AS findLongValue0 FROM userData find0 WHERE find0.groupName='user' AND find0.attributeName LIKE 'login%' AND find0.value LIKE 'SY%'", dest=Remote,
parse_context=0x464598) at postgres.c:784
#24 0x1ca63c in PostgresMain (argc=4, argv=0xffbef018,
username=0x4607e1 "mats") at postgres.c:1926
#25 0x18bab0 in DoBackend (port=0x4606b0) at postmaster.c:2243
#26 0x18af48 in BackendStartup (port=0x4606b0) at postmaster.c:1874
#27 0x189548 in ServerLoop () at postmaster.c:995
#28 0x188d18 in PostmasterMain (argc=1, argv=0x447db0) at postmaster.c:771
#29 0x143ebc in main (argc=1, argv=0xffbefacc) at main.c:206
(gdb) up
#1 0x21844c in convert_string_datum (value=5251848, typid=1043)
at selfuncs.c:2059
2059 pfree(val);
(gdb) print val
$1 = 0x4b7878 "D1BFD67F71192ECE"
(gdb) print xfrmstr
$2 = 0x4b78d8 "\001R\0014\001P\001T\001R\0019\001:\001T\001:\0014\0014\001<\0015\001S\001Q\001S\001\001\001S\001Q\001S\0015\001<\0014\0014\001:\001T\001:\0019\001R\001T\001P\0014\001R\001\001\001R\0014\001P\001T\001R\0019\001:\001T\001:\0014\0014\001<\0015\001S\001Q\001S\001\001"
(gdb) print xfrmsize
$3 = 48
(gdb) print xfrmlen
$4 = 102
(gdb) print *(varattrib *)(value)
$5 = {va_header = 20, va_content = {va_compressed = {va_rawsize = 1144078918,
va_data = "D"}, va_external = {va_rawsize = 1144078918,
va_extsize = 1144403782, va_valueid = 925970745,
va_toastrelid = 843400005}, va_data = "D"}}
(gdb) print (char *)((varattrib *)(value))->va_content.va_data
$6 = 0x50230c "D1BFD67F71192ECE~", '\177' <repeats 183 times>...
(gdb) list
2054 /* Oops, didn't make it */
2055 pfree(xfrmstr);
2056 xfrmstr = (char *) palloc(xfrmlen + 1);
2057 xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
2058 }
2059 pfree(val);
2060 val = xfrmstr;
2061 #endif
2062
2063 return (unsigned char *) val;
(gdb) down
#0 0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
446 AssertArg(MemoryContextIsValid(header->context));
(gdb) print header
$7 = (StandardChunkHeader *) 0x4b7868
(gdb) print *header
$8 = {context = 0x15246b8, size = 32, requested_size = 17}
(gdb)

Please let me know if there is more info I can get out of
gdb to track this down.

_
Mats Lofkvist
mal(at)algonet(dot)se


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mats Lofkvist <mal(at)algonet(dot)se>
Cc: pgsql-bugs(at)postgresql(dot)org, Andrew Sullivan <andrew(at)libertyrms(dot)info>
Subject: Re: 7.2.1 backend crash (convert_string_datum, locale)
Date: 2002-07-12 03:15:42
Message-ID: 6446.1026443742@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Mats Lofkvist <mal(at)algonet(dot)se> writes:
> When testing postgres 7.2.1 on a sparc/solaris8 box with
> --enable-locale --enable-multibyte I get a crash in
> convert_string_datum.

This smells like a problem that we chased down awhile back, that
snprintf on Solaris is broken (it will write past the end of the
specified buffer length, thus corrupting adjacent data).

Andrew, I think that was your test case we found it on. Do you
recall if a fix is available from Sun?

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mats Lofkvist <mal(at)algonet(dot)se>, pgsql-bugs(at)postgresql(dot)org, Andrew Sullivan <andrew(at)libertyrms(dot)info>
Subject: Re: 7.2.1 backend crash (convert_string_datum, locale)
Date: 2002-07-12 03:36:06
Message-ID: 200207120336.g6C3a6x09284@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> Mats Lofkvist <mal(at)algonet(dot)se> writes:
> > When testing postgres 7.2.1 on a sparc/solaris8 box with
> > --enable-locale --enable-multibyte I get a crash in
> > convert_string_datum.
>
> This smells like a problem that we chased down awhile back, that
> snprintf on Solaris is broken (it will write past the end of the
> specified buffer length, thus corrupting adjacent data).
>
> Andrew, I think that was your test case we found it on. Do you
> recall if a fix is available from Sun?

Yes, I remember this too. It was specifically multibyte-related.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Andrew Sullivan <andrew(at)libertyrms(dot)info>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mats Lofkvist <mal(at)algonet(dot)se>, pgsql-bugs(at)postgresql(dot)org, Andrew Sullivan <andrew(at)libertyrms(dot)info>
Subject: Re: 7.2.1 backend crash (convert_string_datum, locale)
Date: 2002-07-12 03:58:45
Message-ID: 20020711235845.A17209@mail.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Thu, Jul 11, 2002 at 11:15:42PM -0400, Tom Lane wrote:
> Mats Lofkvist <mal(at)algonet(dot)se> writes:
> > When testing postgres 7.2.1 on a sparc/solaris8 box with
> > --enable-locale --enable-multibyte I get a crash in
> > convert_string_datum.
>
> This smells like a problem that we chased down awhile back, that
> snprintf on Solaris is broken (it will write past the end of the
> specified buffer length, thus corrupting adjacent data).

It does indeed. This was only the 64-bit library, though, or at
least as far as we were able to tell. And I wasn't able to turn up
any evidence that it happened on Solaris 8. But it might. We don't
use 8, at least not yet.

> Andrew, I think that was your test case we found it on. Do you
> recall if a fix is available from Sun?

Not as far as I know, at least for 7. Come to think of it, I now
_do_ recall seeing something in my various Google wanderings which
suggested that there is a fix in one of the patch packages for
Solaris 8 (which suggests the buggy library is in the basic Solaris 8
install). I dimly recall some mention of incompatibility between it
and some other patchlevel, as well, so it might require some digging.
(Given that it's really a bounds mistake in a system library, you'd
think that it'd be easier to find more information about it; I
actually learned almost everything I know about the problem from,
IIRC, the autoconf web pages, so I'd not expect a cursory search of
Sun's site to turn anything up.)

In the FAQ_Solaris, there is a suggestion to use the substitute
function included in the Postgres tree (which is what you suggested,
Tom, and what I did), as well as instructions on how to do it. It
definitely works for me on Solaris 7. Might be worth trying on 8 as
well. If so, the FAQ should be updated so as not to limit the
discussion to Solaris 7 and earlier.

Sorry I can't be more help than this.

A

--
----
Andrew Sullivan 87 Mowat Avenue
Liberty RMS Toronto, Ontario Canada
<andrew(at)libertyrms(dot)info> M6K 3E3
+1 416 646 3304 x110


From: Mats Lofkvist <mal(at)algonet(dot)se>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 7.2.1 backend crash (convert_string_datum, locale)
Date: 2002-07-15 12:59:19
Message-ID: y2qn0st4060.fsf@algonet.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

andrew(at)libertyrms(dot)info (Andrew Sullivan) writes:
> On Thu, Jul 11, 2002 at 11:15:42PM -0400, Tom Lane wrote:
> > Mats Lofkvist <mal(at)algonet(dot)se> writes:
> > > When testing postgres 7.2.1 on a sparc/solaris8 box with
> > > --enable-locale --enable-multibyte I get a crash in
> > > convert_string_datum.
> >
> > This smells like a problem that we chased down awhile back, that
> > snprintf on Solaris is broken (it will write past the end of the
> > specified buffer length, thus corrupting adjacent data).
>
> It does indeed. This was only the 64-bit library, though, or at
> least as far as we were able to tell. And I wasn't able to turn up
> any evidence that it happened on Solaris 8. But it might. We don't
> use 8, at least not yet.
>
> > Andrew, I think that was your test case we found it on. Do you
> > recall if a fix is available from Sun?
>
> Not as far as I know, at least for 7. Come to think of it, I now
> _do_ recall seeing something in my various Google wanderings which
> suggested that there is a fix in one of the patch packages for
> Solaris 8 (which suggests the buggy library is in the basic Solaris 8
> install). I dimly recall some mention of incompatibility between it
> and some other patchlevel, as well, so it might require some digging.
> (Given that it's really a bounds mistake in a system library, you'd
> think that it'd be easier to find more information about it; I
> actually learned almost everything I know about the problem from,
> IIRC, the autoconf web pages, so I'd not expect a cursory search of
> Sun's site to turn anything up.)
>
> In the FAQ_Solaris, there is a suggestion to use the substitute
> function included in the Postgres tree (which is what you suggested,
> Tom, and what I did), as well as instructions on how to do it. It
> definitely works for me on Solaris 7. Might be worth trying on 8 as
> well. If so, the FAQ should be updated so as not to limit the
> discussion to Solaris 7 and earlier.

I didn't get it to work with the stuff in FAQ_Solaris (can't
guarantee I really got snprintf substituted though, just
followed the instructions and recompiled).

Removing --enable-multibyte didn't help either.

Without neither --enable-locale or --enable-multibyte it
seems to work, but as I had to create a new database when
removing locale any problems local to the first database
are not seen anymore.

Is postgres 8-bit clean without locale support enabled?
(I don't care about sort orders and such, only need to
read/write 8-bit chars via jdbc).

_
Mats Lofkvist
mal(at)algonet(dot)se


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mats Lofkvist <mal(at)algonet(dot)se>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 7.2.1 backend crash (convert_string_datum, locale)
Date: 2002-07-15 13:56:15
Message-ID: 16056.1026741375@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Mats Lofkvist <mal(at)algonet(dot)se> writes:
> Without neither --enable-locale or --enable-multibyte it
> seems to work, but as I had to create a new database when
> removing locale any problems local to the first database
> are not seen anymore.

Hm. If the database is already corrupt then simply recompiling
a corrected binary isn't going to magically make things perfect.
Maybe you should retry the snprintf patch and/or --enable-multibyte
using fresh databases.

> Is postgres 8-bit clean without locale support enabled?
> (I don't care about sort orders and such, only need to
> read/write 8-bit chars via jdbc).

In that case you don't really need locale, no. Not sure about
whether you need multibyte; does JDBC expect Unicode support?

regards, tom lane