Re: BUG #5418: psql exits after using tab-completion with error message

Lists: pgsql-bugs
From: "Ben Madin" <ben(at)ausvet(dot)com(dot)au>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-13 08:39:13
Message-ID: 201004130839.o3D8dD6O033928@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5418
Logged by: Ben Madin
Email address: ben(at)ausvet(dot)com(dot)au
PostgreSQL version: 8.4.3
Operating system: Mac OS X 10.6.3
Description: psql exits after using tab-completion with error message
Details:

G'day,

this problem appear to be intermittent - in so far as I don't always notice
it. It has been happening for a number of versions (since 8.3 at least) and
it might work or it might not, but I can't really pick what has changed when
it starts happening. Once it starts, it seems very hard to stop. Very
anecdotally, I think it only happens when the characters entered so far are
ambiguous (ie could be more than one table) and the tables are recently
added to the database.

Here is an example : (I wanted the abattoir table, so I had typed \d aba and
then pressed the tab key)

psql (8.4.3)
Type "help" for help.

prices=# SELECT version();
version

----------------------------------------------------------------------------
--------------------------------------------------------------------
PostgreSQL 8.4.3 on i386-apple-darwin10.3.0, compiled by GCC
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) (dot 1),
64-bit
(1 row)

prices=# \set VERBOSITY verbose
prices=# \d abapsql(11407) malloc: *** error for object 0xe: pointer being
freed was not allocated
*** set a breakpoint in malloc_error_break to debug
Abort trap

but if I had only typed \d a, it would have just waited (as there are two
other tables starting with a)

I have tried :
1. restarting the terminal
2. restarting the pg_server
3. rebooting the system
4. a few cycles of upgrades
5. dropping and reloading the database itself.

and I don't know where else to go. I have no idea about what
malloc_error_break means or where to start.

I hope this is helpful enough, please let me know if you require further
information.

cheers

Ben


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Ben Madin <ben(at)ausvet(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-13 12:33:01
Message-ID: 4BC4647D.1060507@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


> prices=# \d abapsql(11407) malloc: *** error for object 0xe: pointer being
> freed was not allocated
> *** set a breakpoint in malloc_error_break to debug
> Abort trap

This could be a bug in psql, a buggy/damaged readline library, etc.

For GUI apps Mac OS X makes a crash record in the system logs. I'm not
sure if it does that for command line apps too - I think it does. Can
you check "Console" and see if there are any crash dumps for psql?

If not, to find out what's going on it may be necessary to attach a
debugger. This probably isn't installed on your computer. You would need
the Developer Tools (XCode etc), which can be obtained from the Apple
web site as a free download.

I don't have access to Mac OS X 10.6, but maybe someone else here does
and can reproduce the issue. Even if they can, it might still be helpful
to get some additional info on the crash you're having, so it'd be great
if you could grab the developer tools and reply once they're installed. See:

http://developer.apple.com/technologies/xcode.html

You need a "dev center" username/password, but it's free to register and
they don't spam you. See:

http://developer.apple.com/programs/register/

(click "Get Started")

--
Craig Ringer


From: Ben Madin <ben(at)ausvet(dot)com(dot)au>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-13 13:16:49
Message-ID: B9728443-7FAB-4375-A8F2-8A94FD99E498@ausvet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

G'day Craig, thanks for your reply.

On 13/04/2010, at 20:33 , Craig Ringer wrote:

>
>> prices=# \d abapsql(11407) malloc: *** error for object 0xe: pointer being
>> freed was not allocated
>> *** set a breakpoint in malloc_error_break to debug
>> Abort trap
>
> This could be a bug in psql, a buggy/damaged readline library, etc.
>
> For GUI apps Mac OS X makes a crash record in the system logs. I'm not sure if it does that for command line apps too - I think it does. Can you check "Console" and see if there are any crash dumps for psql?

I have checked console, and there are many psql entries - I have attached two as they all appear fairly similar, some numbers changing in this section :

Thread 0 crashed with X86 Thread State (64-bit):
rax: 0x0000000000000000 rbx: 0x0000000000000002 rcx: 0x00007fff5fbff448 rdx: 0x0000000000000000
rdi: 0x0000000000002efd rsi: 0x0000000000000006 rbp: 0x00007fff5fbff460 rsp: 0x00007fff5fbff448
r8: 0x0000000000000e03 r9: 0x0000000000000000 r10: 0x00007fff868cf8ca r11: 0x0000000000000202
r12: 0x00000001000d5000 r13: 0x00000001000d2000 r14: 0x0000000000000000 r15: 0x0000000000000003
rip: 0x00007fff868d3886 rfl: 0x0000000000000202 cr2: 0x0000000100188bd5

The ones that vary most are rdi, rcx, rbp, cr2 and r15.

Attachment Content-Type Size
psql_2010-04-13-164804_murdoch.crash application/octet-stream 5.5 KB
psql_2010-04-12-214359_murdoch.crash application/octet-stream 5.5 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Ben Madin <ben(at)ausvet(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-13 19:28:33
Message-ID: 20100413192832.GF2990@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Ben Madin wrote:

> Bug reference: 5418
> Logged by: Ben Madin
> Email address: ben(at)ausvet(dot)com(dot)au
> PostgreSQL version: 8.4.3
> Operating system: Mac OS X 10.6.3
> Description: psql exits after using tab-completion with error message

Lots of problems have been reported with MacOSX's libreadline -- it is
said to be buggy. I think the recommendation is to install vanilla GNU
libreadline and compile Postgres against that one.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Ben Madin <ben(at)ausvet(dot)com(dot)au>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-13 21:02:54
Message-ID: 10385.1271192574@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Craig Ringer <craig(at)postnewspapers(dot)com(dot)au> writes:
>> prices=# \d abapsql(11407) malloc: *** error for object 0xe: pointer being
>> freed was not allocated
>> *** set a breakpoint in malloc_error_break to debug
>> Abort trap

> This could be a bug in psql, a buggy/damaged readline library, etc.
> ...
> I don't have access to Mac OS X 10.6, but maybe someone else here does
> and can reproduce the issue.

It's fairly easy to reproduce in the regression database:
type "\d ten<TAB>". I'm not sure what the triggering condition
is exactly, because some seemingly-similar cases don't fail,
for instance "\d test<TAB>" works as expected, ditto "\d t<TAB>".

Stack trace looks like this:

regression=# \d tenpsql(16771) malloc: *** error for object 0xd: pointer being freed was not allocated
*** set a breakpoint in malloc_error_break to debug

Program received signal SIGABRT, Aborted.
0x00007fff83652886 in __kill ()
(gdb) bt
#0 0x00007fff83652886 in __kill ()
#1 0x00007fff836f2eae in abort ()
#2 0x00007fff8360aa75 in free ()
#3 0x000000010009b9a8 in fn_complete ()
#4 0x00000001000a1416 in rl_complete ()
#5 0x00000001000a1428 in rl_complete ()
#6 0x000000010009fb87 in el_gets ()
#7 0x00000001000a19bf in readline ()
#8 0x00000001000083ff in gets_interactive (prompt=<value temporarily unavailable, due to optimizations>) at input.c:76
#9 0x000000010000bfdb in MainLoop (source=0x7fff705a30c0) at mainloop.c:134
#10 0x000000010000e6d4 in main (argc=<value temporarily unavailable, due to optimizations>, argv=0x7fff5fbff510) at startup.c:305

The object address is nonreproducible (varies even in seemingly
identical test runs), but it's always a very small integer, 1 to 0xd or
so.

Since this doesn't happen on any of my libreadline-using boxes, it seems
like a fairly safe bet that it's a bug in libedit, rather than us using
the library incorrectly. You can try to get Apple to take an interest,
but there's not much we can do about it.

I concur with Alvaro's suggestion to install GNU readline instead of
depending on libedit.

regards, tom lane


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Ben Madin <ben(at)ausvet(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-13 23:08:45
Message-ID: 4BC4F97D.1070702@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 13/04/10 21:16, Ben Madin wrote:

> I have checked console, and there are many psql entries - I have attached two as they all appear fairly similar, some numbers changing in this section :
>
> Thread 0 crashed with X86 Thread State (64-bit):
> rax: 0x0000000000000000 rbx: 0x0000000000000002 rcx: 0x00007fff5fbff448 rdx: 0x0000000000000000
> rdi: 0x0000000000002efd rsi: 0x0000000000000006 rbp: 0x00007fff5fbff460 rsp: 0x00007fff5fbff448
> r8: 0x0000000000000e03 r9: 0x0000000000000000 r10: 0x00007fff868cf8ca r11: 0x0000000000000202
> r12: 0x00000001000d5000 r13: 0x00000001000d2000 r14: 0x0000000000000000 r15: 0x0000000000000003
> rip: 0x00007fff868d3886 rfl: 0x0000000000000202 cr2: 0x0000000100188bd5
>
> The ones that vary most are rdi, rcx, rbp, cr2 and r15.

Darn. There's no more information, like a numbered list of functions
(stack trace), list of linked libraries, etc? Maybe OS X only generates
that for GUI app crashes.

The stack trace is really what's needed. While it's possible to figure
out where a program crashed based on the thread state dump as shown
above, it doesn't give you any information about how it got there - and
that can be rather helpful.

Is there any chance you can run psql under gdb from the developer tools
and reproduce the fault that way? Then, when it crashes, get a backtrace?

Since you're clearly pretty familiar with the shell, I'll just
illustrate how to do it:

$ gdb --quiet --args psql [any psql params here]
(gdb) run
[ do whatever you need to do to make psql crash ]
Program received signal SIGSEGV, Segmentation fault.
0x007ca422 in __kernel_vsyscall ()
(gdb) bt
#0 0x007ca422 in __kernel_vsyscall ()
#1 0x001cddd3 in __read_nocancel () at
../sysdeps/unix/syscall-template.S:82
#2 0x00de68c7 in rl_getc () from /lib/libreadline.so.6
#3 0x00de6ea3 in rl_read_key () from /lib/libreadline.so.6
#4 0x00dd109e in readline_internal_char () from /lib/libreadline.so.6
#5 0x00dd15ed in readline () from /lib/libreadline.so.6
#6 0x00730ff1 in ?? ()
#7 0x00733ebe in ?? ()
#8 0x00737964 in main ()
(gdb)

If you paste all the output after "run", that'd be really handy.

If for some reason you can't start psql under gdb, you can instead run
psql normally and then attach gdb to psql using "gdb -p pidofpsql" . Get
"pidofpsql" using the "ps" command - "ps -ef" or "ps aux" depending, I
don't remember which flavour Mac OS X understands - passed through "|
grep psql".

> I have the developer tools installed - but I think only because I needed them installed to install something ages ago.

Great.

--
Craig Ringer


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Ben Madin <ben(at)ausvet(dot)com(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-13 23:16:27
Message-ID: 4BC4FB4B.4040106@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 13/04/10 21:16, Ben Madin wrote:
> G'day Craig, thanks for your reply.

Please disregard my follow-up. I hadn't seen Tom's reply that he was
able to reproduce the issue. There's no need for you to collect a
backtrace now :-)

--
Craig Ringer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Ben Madin <ben(at)ausvet(dot)com(dot)au>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-15 04:13:51
Message-ID: 22441.1271304831@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I wrote:
> It's fairly easy to reproduce in the regression database:
> type "\d ten<TAB>". I'm not sure what the triggering condition
> is exactly, because some seemingly-similar cases don't fail,
> for instance "\d test<TAB>" works as expected, ditto "\d t<TAB>".

It turns out that the problem occurs when there are exactly 9 + 10*N
possible completions, for any N>=0. There's an off-by-one logic bug
in libedit that results in a memory stomp because it forgets to enlarge
an array before storing a terminating null pointer in it.

The upstream netbsd sources incorporated a fix some time ago:
http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/readline.c.diff?r1=1.82&r2=1.83&sortby=date&f=h
with credit to Caleb Welton at Greenplum --- I wonder if he found it
because of psql failing? Apple hasn't incorporated this fix as of
OS X 10.6.3, however.

What's slightly more distressing is that the same source file
(readline.c) appears to have at least two other occurrences of the same
broken array-enlargement coding pattern, which were *not* fixed.

I've reported this to Apple but I'm not real sure where to file NetBSD
bugs. Anybody want to yank the BSD guys' chain about the other errors?

regards, tom lane


From: Ben Madin <ben(at)ausvet(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-15 13:43:17
Message-ID: D500103B-E461-4213-974E-2E70F5DD817F@ausvet.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thanks Tom,

I also reported it to Apple, but without this information, so I hope they get the sense that it might be important enough to look at, especially if there is already a fix known.

I also contacted William Kyngesbury, and he was sympathetic (but had never had the problem himself, but neither does he use the tab-completion), but would rather Apple fixed it because his GIS software suite is very based on using Apple OSX frameworks where they exist.

cheers

Ben

On 15/04/2010, at 12:13 , Tom Lane wrote:

> I wrote:
>> It's fairly easy to reproduce in the regression database:
>> type "\d ten<TAB>". I'm not sure what the triggering condition
>> is exactly, because some seemingly-similar cases don't fail,
>> for instance "\d test<TAB>" works as expected, ditto "\d t<TAB>".
>
> It turns out that the problem occurs when there are exactly 9 + 10*N
> possible completions, for any N>=0. There's an off-by-one logic bug
> in libedit that results in a memory stomp because it forgets to enlarge
> an array before storing a terminating null pointer in it.
>
> The upstream netbsd sources incorporated a fix some time ago:
> http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libedit/readline.c.diff?r1=1.82&r2=1.83&sortby=date&f=h
> with credit to Caleb Welton at Greenplum --- I wonder if he found it
> because of psql failing? Apple hasn't incorporated this fix as of
> OS X 10.6.3, however.
>
> What's slightly more distressing is that the same source file
> (readline.c) appears to have at least two other occurrences of the same
> broken array-enlargement coding pattern, which were *not* fixed.
>
> I've reported this to Apple but I'm not real sure where to file NetBSD
> bugs. Anybody want to yank the BSD guys' chain about the other errors?
>
> regards, tom lane

--

Ben Madin
AusVet Animal Health Services
P.O. Box 5467
Broome WA 6725
Australia

t : +61 8 9192 5455
f : +61 8 9192 5535
m : 0448 887 220
e : ben(at)ausvet(dot)com(dot)au

AusVet's website: http://www.ausvet.com.au

This transmission is for the intended addressee only and is confidential information. If you have received this transmission in error, please delete it and notify the sender. The contents of this email are the opinion of the writer only and are not endorsed by AusVet Animal Health Services unless expressly stated otherwise. Although AusVet uses virus scanning software we do not accept liability for viruses or similar in any attachments.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ben Madin <ben(at)ausvet(dot)com(dot)au>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5418: psql exits after using tab-completion with error message
Date: 2010-04-15 13:49:59
Message-ID: 830.1271339399@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Ben Madin <ben(at)ausvet(dot)com(dot)au> writes:
> I also reported it to Apple, but without this information, so I hope they get the sense that it might be important enough to look at, especially if there is already a fix known.

Mine is problem ID 7866382, if you'd like to add a note to yours
pointing out the duplication.

If this is biting you on a regular basis, one easy workaround would be
to add a dummy table or index to change the number of possible
completions.

regards, tom lane