updating to 7.4.13 helped it appears

Lists: pgsql-general
From: Geoffrey <esoteric(at)3times25(dot)net>
To: PostgreSQL List <pgsql-general(at)postgresql(dot)org>
Subject: updating to 7.4.13 helped it appears
Date: 2006-10-31 17:03:39
Message-ID: 454781EB.6010703@3times25.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

It appears that upgrading to 7.4.13 helped the problem we were having
with the postgres process terminating. We still are having the problem,
but it does appear to be different, based on the output of backtraces.
The core files are much larger and there does seem to be a common thread
amongst most of them. I've attached one to see if anyone has any ideas
as to what our problem might be. Suggestions would be appreciated.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin

Attachment Content-Type Size
gdb16444.out text/plain 3.0 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Geoffrey <esoteric(at)3times25(dot)net>
Cc: PostgreSQL List <pgsql-general(at)postgresql(dot)org>
Subject: Re: updating to 7.4.13 helped it appears
Date: 2006-10-31 17:57:11
Message-ID: 20061031175711.GC9247@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Geoffrey wrote:
> It appears that upgrading to 7.4.13 helped the problem we were having
> with the postgres process terminating. We still are having the problem,
> but it does appear to be different, based on the output of backtraces.
> The core files are much larger and there does seem to be a common thread
> amongst most of them. I've attached one to see if anyone has any ideas
> as to what our problem might be. Suggestions would be appreciated.

I don't think this backtrace makes much sense. Did you compile with
--enable-debug?

Are you sure you are passing the same postgres executable to GDB that
was used to actually generate the core (i.e. the one that's running)?
Is this core file generated from exactly that executable, or is it maybe
one that was generated with an older executable?

> Using host libthread_db library "/lib/tls/libthread_db.so.1".
> Core was generated by `postgres: msanchez exp 198.212.166.29 SELECT '.
> Program terminated with signal 11, Segmentation fault.
> #0 0x0815d950 in cost_mergejoin (path=0x836f20c, root=0x8370708)
> at costsize.c:915
> 915 if (rescannedtuples < 0)
> #0 0x0815d950 in cost_mergejoin (path=0x836f20c, root=0x8370708)
> at costsize.c:915
> #1 0x0815d98c in cost_mergejoin (path=0x836f20c, root=0x8370300)
> at costsize.c:932
> #2 0x0815d98c in cost_mergejoin (path=0x836f20c, root=0x836f2ec)
> at costsize.c:932
> #3 0x0815d8c1 in cost_mergejoin (path=0x836f20c, root=0x775360)
> at costsize.c:878
> #4 0x0815c428 in clauselist_selectivity (root=0x827d6e4, clauses=0xfeff7798,
> varRelid=-16812072, jointype=135970173) at clausesel.c:203
> #5 0x081637f5 in get_cheapest_path_for_pathkeys (paths=0x827d6e4,
> pathkeys=0x882eeb0, cost_criterion=7) at pathkeys.c:586
> #6 0x081abd7d in ProcessUtility (parsetree=0x835ea10, dest=0x882ee54,
> completionTag=0xa <Address 0xa out of bounds>) at utility.c:611
> #7 0x081ac1ff in ProcessUtility (parsetree=0xfeff7850, dest=0x88af7e0,
> completionTag=0x88af408 "\v") at utility.c:793
> #8 0x081082c4 in CreateTrigger (stmt=0x88af588, forConstraint=-100 '\234')
> at trigger.c:155
> #9 0x08109e30 in CopyTriggerDesc (trigdesc=0x88af588) at trigger.c:922
> #10 0x0810aa4b in ExecBSDeleteTriggers (estate=0x88af828, relinfo=0x88af408)
> at trigger.c:1324
> #11 0x0810ae71 in ExecASUpdateTriggers (estate=0x88af380, relinfo=0x8112de0)
> at trigger.c:1462
> #12 0x08112ec9 in AlterUserSet (stmt=0x88af380) at user.c:1002
> #13 0x08106d66 in createForeignKeyTriggers (rel=0x88af380, fkconstraint=0x1,
> constrOid=141463696) at tablecmds.c:3697
> #14 0x08113718 in CreateGroup (stmt=0x88af2f8) at user.c:1273
> #15 0x08106e11 in createForeignKeyTriggers (rel=0x88af2f8, fkconstraint=0x413,
> constrOid=4278155864) at tablecmds.c:3714
> #16 0x081055fd in AlterTableAddCheckConstraint (rel=0x88af1a8,
> constr=0x88af2f8) at tablecmds.c:2951
> #17 0x081049d8 in AlterTableAlterOids (myrelid=139861992, recurse=-88 '?',
> setOid=-40 '?') at tablecmds.c:2513
> #18 0x0817c33b in BackendFork (port=0x8360fd8) at postmaster.c:2485
> #19 0x0817c113 in BackendFork (port=0x8360fd8) at postmaster.c:2403
> #20 0x081788d3 in PGSemaphoreLock (sema=0x835cbc0, interruptOK=0 '\0')
> at pg_sema.c:424
> #21 0x0817b1c1 in pmdie (postgres_signal_arg=4) at postmaster.c:1701
> #22 0x08154c40 in _readConst () at readfuncs.c:377
> #23 0x08154633 in _readResdom () at readfuncs.c:311
> #24 0x08152b98 in _outAExpr (str=0x2, node=0x1) at outfuncs.c:1366
> #25 0x0815225e in _outIndexElem (str=0x5, node=0x830d6b8) at outfuncs.c:1208
> #26 0x08121f63 in ExecEndNode (node=0x5) at execProcnode.c:499
> #27 0x0065479a in ?? ()
> #28 0x00000005 in ?? ()
> #29 0xfeff8c64 in ?? ()
> #30 0xfeff8c7c in ?? ()
> #31 0x00000000 in ?? ()
> #32 0x00774a78 in ?? ()
> #33 0x0013a020 in ?? ()
> #34 0x081fbd18 in interval_part (fcinfo=0x8121d30) at timestamp.c:3374
> #35 0x0806fd51 in nocachegetattr (tuple=) at heaptuple.c:409

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Geoffrey <esoteric(at)3times25(dot)net>
To: PostgreSQL List <pgsql-general(at)postgresql(dot)org>
Subject: Re: updating to 7.4.13 helped it appears
Date: 2006-10-31 18:22:04
Message-ID: 4547944C.4050502@3times25.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alvaro Herrera wrote:
> Geoffrey wrote:
>> It appears that upgrading to 7.4.13 helped the problem we were having
>> with the postgres process terminating. We still are having the problem,
>> but it does appear to be different, based on the output of backtraces.
>> The core files are much larger and there does seem to be a common thread
>> amongst most of them. I've attached one to see if anyone has any ideas
>> as to what our problem might be. Suggestions would be appreciated.
>
> I don't think this backtrace makes much sense. Did you compile with
> --enable-debug?

It didn't make much sense to me either, but then, I'm not familiar with
the postgres code. :( Is this a gcc flag? I did compile it with -g
option, I don't see an --enable-debug in the gcc man page.

> Are you sure you are passing the same postgres executable to GDB that
> was used to actually generate the core (i.e. the one that's running)?
> Is this core file generated from exactly that executable, or is it maybe
> one that was generated with an older executable?

The core files were generated on a machine that does not have postgres
compiled with debugging information, thus I built from source for the
same version on another machine and ran gdb against it and the generated
core file. I've done this in the past with different applications and
was successful in debugging the core file. If you believe this is not
generating an accurate trace, then I'll need to rebuild postgres on the
production machine (which is not what I wanted to do).

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Geoffrey <esoteric(at)3times25(dot)net>
Cc: PostgreSQL List <pgsql-general(at)postgresql(dot)org>
Subject: Re: updating to 7.4.13 helped it appears
Date: 2006-10-31 18:31:55
Message-ID: 20061031183155.GB12008@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Geoffrey wrote:
> Alvaro Herrera wrote:
> >Geoffrey wrote:
> >>It appears that upgrading to 7.4.13 helped the problem we were having
> >>with the postgres process terminating. We still are having the problem,
> >>but it does appear to be different, based on the output of backtraces.
> >>The core files are much larger and there does seem to be a common thread
> >>amongst most of them. I've attached one to see if anyone has any ideas
> >>as to what our problem might be. Suggestions would be appreciated.
> >
> >I don't think this backtrace makes much sense. Did you compile with
> >--enable-debug?
>
> It didn't make much sense to me either, but then, I'm not familiar with
> the postgres code. :( Is this a gcc flag? I did compile it with -g
> option, I don't see an --enable-debug in the gcc man page.

--enable-debug is a flag to "configure". It'll automatically add -g to
CFLAGS (I'm not sure if it does anything else, but it's easier than
specifying that yourself.)

> >Are you sure you are passing the same postgres executable to GDB that
> >was used to actually generate the core (i.e. the one that's running)?
> >Is this core file generated from exactly that executable, or is it maybe
> >one that was generated with an older executable?
>
> The core files were generated on a machine that does not have postgres
> compiled with debugging information, thus I built from source for the
> same version on another machine and ran gdb against it and the generated
> core file. I've done this in the past with different applications and
> was successful in debugging the core file. If you believe this is not
> generating an accurate trace, then I'll need to rebuild postgres on the
> production machine (which is not what I wanted to do).

I'm not 100% sure what you are saying here, but if it is what I believe,
then you didn't copy the newly compiled executable into the production
machine; that won't work. You need to use a debug-enabled executable
both to produce the core file, and to pass to GDB for inspection.

On the other hand, if you can reproduce the failure on the development
machine, that core file would serve just fine. (You'd only need to copy
the tables and relevant data from production to said machine).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Geoffrey <esoteric(at)3times25(dot)net>
To: PostgreSQL List <pgsql-general(at)postgresql(dot)org>
Subject: Re: updating to 7.4.13 helped it appears
Date: 2006-10-31 19:00:12
Message-ID: 45479D3C.7050800@3times25.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Alvaro Herrera wrote:

> I'm not 100% sure what you are saying here, but if it is what I believe,
> then you didn't copy the newly compiled executable into the production
> machine; that won't work. You need to use a debug-enabled executable
> both to produce the core file, and to pass to GDB for inspection.

This is correct, I did not copy the executable to the production
machine. I suspect I'll be copying the binary over to the production
system.

I moved the core file to the development machine where I built the new
binaries. Ran gdb against this core file and the postgres binary on
this machine. The core was not generated on the development machine.

> On the other hand, if you can reproduce the failure on the development
> machine, that core file would serve just fine. (You'd only need to copy
> the tables and relevant data from production to said machine).

I have not had any success in duplicating the failure on my development
environment. I suspect it's because I can't generate the volume of
users. The production system could well have 150-200 users at one time
and we get a core file generated about 3-4 times a week, generally on
the busiest days.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
- Benjamin Franklin


From: Dimitri Fontaine <dim(at)dalibo(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Nicolas Niclausse <nico(at)niclux(dot)org>
Subject: Re: updating to 7.4.13 helped it appears
Date: 2006-10-31 22:38:44
Message-ID: 200610312338.44426.dim@dalibo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi list,

Le mardi 31 octobre 2006 20:00, Geoffrey a écrit :
> I have not had any success in duplicating the failure on my development
> environment. I suspect it's because I can't generate the volume of
> users. The production system could well have 150-200 users at one time
> and we get a core file generated about 3-4 times a week, generally on
> the busiest days.

You could use pgfouine[1] and tsung[2] to easily reproduce such a load,
following those steps:
- activate query logging on your dev env, using syslog
- use pgfouine to produce a tsung session file [3]
- make a tsung file configuration using this session file
- use tsung to simulate as many users as wanted

[1]: http://pgfouine.projects.postgresql.org/
[2]: http://tsung.erlang-projects.org/
[3]: http://pgfouine.projects.postgresql.org/tsung.html

Regards,
--
Dimitri Fontaine
http://www.dalibo.com/