Re: BUG #2168: 45.000.000 records too much?

Lists: pgsql-bugs
From: "Steven Mooij" <steven(at)mooij(dot)name>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #2168: 45.000.000 records too much?
Date: 2006-01-13 13:51:16
Message-ID: 20060113135116.64365F0B01@svr2.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 2168
Logged by: Steven Mooij
Email address: steven(at)mooij(dot)name
PostgreSQL version: 8.1
Operating system: Linux
Description: 45.000.000 records too much?
Details:

Tried to copy the result of a join into another table, about 45.000.000
(small) records. This is the result:

testsearch=> insert into t_documentword2 (SELECT document_id, t_word2.id,
frequency from t_documentword, t_word2 where t_documentword.word =
t_word2.word);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I got the same result in version 7.5.15.


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2168: 45.000.000 records too much?
Date: 2006-01-14 08:02:50
Message-ID: dqab4t$1anq$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


""Steven Mooij"" <steven(at)mooij(dot)name> wrote
>
>
> Tried to copy the result of a join into another table, about 45.000.000
> (small) records. This is the result:
>
> testsearch=> insert into t_documentword2 (SELECT document_id, t_word2.id,
> frequency from t_documentword, t_word2 where t_documentword.word =
> t_word2.word);
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>

I guess you are Rudolph's friend ;-) Please follow the post "Error when
inserting millions of record" in pg-general.

Regards,
Qingqing


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Steven Mooij" <steven(at)mooij(dot)name>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2168: 45.000.000 records too much?
Date: 2006-01-14 17:57:36
Message-ID: 20606.1137261456@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Steven Mooij" <steven(at)mooij(dot)name> writes:
> testsearch=> insert into t_documentword2 (SELECT document_id, t_word2.id,
> frequency from t_documentword, t_word2 where t_documentword.word =
> t_word2.word);
> server closed the connection unexpectedly

There's not enough information here to guess what the problem is.
(But it's not table size; people routinely manipulate tables much
bigger than that in Postgres.) A stack trace from the core dump
would be really helpful, or even better a self-contained test case
that other people could replicate the failure with. See the bug
reporting guidelines at
http://www.postgresql.org/docs/8.1/static/bug-reporting.html

> I got the same result in version 7.5.15.

There is no version of Postgres named 7.5.anything.

regards, tom lane


From: Steven Mooij <steven(at)mooij(dot)name>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2168: 45.000.000 records too much?
Date: 2006-01-18 12:44:46
Message-ID: 43CE383E.4010707@mooij.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

> "Steven Mooij" <steven(at)mooij(dot)name> writes:
>
>
>> testsearch=> insert into t_documentword2 (SELECT document_id,
>> t_word2.id,
>> frequency from t_documentword, t_word2 where t_documentword.word =
>> t_word2.word);
>> server closed the connection unexpectedly
>>
>
>
> There's not enough information here to guess what the problem is.
> (But it's not table size; people routinely manipulate tables much
> bigger than that in Postgres.) A stack trace from the core dump
> would be really helpful, or even better a self-contained test case
> that other people could replicate the failure with. See the bug
> reporting guidelines at
> http://www.postgresql.org/docs/8.1/static/bug-reporting.html
>
>
>

I created a self-contained test case, here's a script that produces a
similar error:

CREATE TABLE
t_test1
(
x bigint,
y bigint,
PRIMARY KEY(x)
)
WITHOUT OIDS;

CREATE TABLE
t_test2
(
x bigint,
z bigint,
PRIMARY KEY (x),
FOREIGN KEY (x) REFERENCES t_test1(x)
)
WITHOUT OIDS;

INSERT INTO t_test1 VALUES (1, 2);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);
INSERT INTO t_test1 (SELECT x + (SELECT count(*) FROM t_test1), y +
(SELECT count(*) FROM t_test1) FROM t_test1);

INSERT INTO t_test2 (SELECT x, x * 2 FROM t_test1);

The first block of inserts completes flawlessly and fills t_test1 with
over 64.000.000 records. It's the last statement copying (no join
involved this time) from t_test1 to t_test2 that results in:

server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

I am not 100% sure, but i think this is the relevant part of the logfile
that goes with it:

LOG: database system was interrupted at 2006-01-18 00:07:48 CET
LOG: checkpoint record is at B/A2857768
LOG: redo record is at B/A28535B0; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 2724; next OID: 75506450
LOG: database system was not properly shut down; automatic recovery in
progressLOG: incomplete startup packet
LOG: redo starts at B/A28535B0
LOG: record with zero length at B/A28577EC
LOG: redo done at B/A28577A8
FATAL: the database system is starting up
LOG: database system is ready

>> I got the same result in version 7.5.15.
>>
>
>
> There is no version of Postgres named 7.5.anything.
>
>
>
I wrongfully looked at the version of the Debian package. When I start
psql it says "Welcome to psql 7.4.9, the PostgreSQL interactive terminal."

Hopefully this is enough information for you to look into it, but if you
do need anything else please let me know...


From: Steven Mooij <steven(at)mooij(dot)name>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2168: 45.000.000 records too much?
Date: 2006-01-18 12:48:50
Message-ID: 43CE3932.5050208@mooij.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Some additional info. We repeated the test on different hardware with
unfortunately the same result. This is the output of the postgresql
logfile of that experiment:

LOG: background writer process (PID 8208) was terminated by signal 9
LOG: terminating any other active server processes
LOG: statistics collector process (PID 8209) was terminated by signal 9
LOG: all server processes terminated; reinitializing
LOG: database system was interrupted at 2006-01-18 11:25:34 CET
LOG: checkpoint record is at 3/AE5CF484
LOG: redo record is at 3/AE5CB388; undo record is at 0/0; shutdown FALSE
LOG: next transaction ID: 10061; next OID: 25648
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: redo starts at 3/AE5CB388
LOG: record with zero length at 3/AE5E7314
LOG: redo done at 3/AE5E72D0
LOG: database system is ready

This time we tested with postgresql 8.0.3 on a 2.6.12 linux kernel.
(Previously i tested with postgresql 7.4.9 on a 2.6.15 kernel.)

I think you will be able to repeat the experiment with this information.

Regards,

Steven


From: Seneca Cunningham <scunning(at)ca(dot)afilias(dot)info>
To: Steven Mooij <steven(at)mooij(dot)name>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2168: 45.000.000 records too much?
Date: 2006-01-19 22:03:18
Message-ID: 43D00CA6.2060004@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Steven Mooij wrote:
> Some additional info. We repeated the test on different hardware with
> unfortunately the same result. This is the output of the postgresql
> logfile of that experiment:

[log messages about processes kill 9ed]

> This time we tested with postgresql 8.0.3 on a 2.6.12 linux kernel.
> (Previously i tested with postgresql 7.4.9 on a 2.6.15 kernel.)

Does dmesg or syslog mention the OOM killer? I tried running your test
case while watching memory usage and stopped the postmaster before it
finished as I didn't want to risk the kernel killing it (or some random
other process) the hard way.

--
Seneca Cunningham
scunning(at)ca(dot)afilias(dot)info


From: Steven Mooij <steven(at)mooij(dot)name>
To: Seneca Cunningham <scunning(at)ca(dot)afilias(dot)info>
Cc: pgsql-bugs(at)postgresql(dot)org, Rudolph Froger <r(dot)froger(at)estrate(dot)nl>
Subject: Re: BUG #2168: 45.000.000 records too much?
Date: 2006-01-20 10:17:51
Message-ID: 43D0B8CF.4070007@mooij.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Seneca Cunningham wrote:

>Steven Mooij wrote:
>
>
>>Some additional info. We repeated the test on different hardware with
>>unfortunately the same result. This is the output of the postgresql
>>logfile of that experiment:
>>
>>
>
>[log messages about processes kill 9ed]
>
>
>
>>This time we tested with postgresql 8.0.3 on a 2.6.12 linux kernel.
>>(Previously i tested with postgresql 7.4.9 on a 2.6.15 kernel.)
>>
>>
>
>Does dmesg or syslog mention the OOM killer? I tried running your test
>case while watching memory usage and stopped the postmaster before it
>finished as I didn't want to risk the kernel killing it (or some random
>other process) the hard way.
>
>
>
Yes, the oom-killer kills the postmaster. But that's besides the
problem. In my opinion the bug is that PostgreSQL has an inefficient
implementation of this operation, that takes way too much memory. (Of
course, I say this with a lot of respect for all the good
PostgreSQL-code that has been written.)

Jan 18 11:29:40 laptop2 kernel: [4304395.479000] oom-killer: gfp_mask=0x1d2
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] DMA per-cpu:
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] cpu 0 hot: low 2, high 6, batch 1
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] cpu 0 cold: low 0, high 2, batch 1
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Normal per-cpu:
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] cpu 0 hot: low 62, high 186, batch 31
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] cpu 0 cold: low 0, high 62, batch 31
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] HighMem per-cpu: empty
Jan 18 11:29:41 laptop2 kernel: [4304395.479000]
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Free pages: 4964kB (0kB HighMem)
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Active:52023 inactive:52455 dirty:0 writeback:0 unstable:0 free:1241 slab:3563 mapped:102467 pagetables:751
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] DMA free:2032kB min:88kB low:108kB high:132kB active:5492kB inactive:5512kB present:16384kB pages_scanned:16847 all_unreclaimable? yes
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] lowmem_reserve[]: 0 486 486
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Normal free:2932kB min:2776kB low:3468kB high:4164kB active:202600kB inactive:204308kB present:498560kB pages_scanned:379686 all_unreclaimable? no
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] lowmem_reserve[]: 0 0 0
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] lowmem_reserve[]: 0 0 0
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2032kB
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Normal: 63*4kB 1*8kB 7*16kB 6*32kB 3*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2932kB
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] HighMem: empty
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Swap cache: add 267257, delete 267257, find 1913/4312, race 0+3
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Free swap = 0kB
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Total swap = 979956kB
Jan 18 11:29:41 laptop2 kernel: [4304395.479000] Out of Memory: Killed process 8208 (postmaster).
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] oom-killer: gfp_mask=0x1d2
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] DMA per-cpu:
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] cpu 0 hot: low 2, high 6, batch 1
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] cpu 0 cold: low 0, high 2, batch 1
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Normal per-cpu:
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] cpu 0 hot: low 62, high 186, batch 31
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] cpu 0 cold: low 0, high 62, batch 31
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] HighMem per-cpu: empty
Jan 18 11:29:41 laptop2 kernel: [4304399.118000]
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Free pages: 4716kB (0kB HighMem)
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Active:51957 inactive:52556 dirty:0 writeback:0 unstable:0 free:1179 slab:3547 mapped:102505 pagetables:745
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] DMA free:2032kB min:88kB low:108kB high:132kB active:5496kB inactive:5528kB present:16384kB pages_scanned:18629 all_unreclaimable? yes
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] lowmem_reserve[]: 0 486 486
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Normal free:2684kB min:2776kB low:3468kB high:4164kB active:202332kB inactive:204696kB present:498560kB pages_scanned:463523 all_unreclaimable? no
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] lowmem_reserve[]: 0 0 0
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] lowmem_reserve[]: 0 0 0
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2032kB
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Normal: 1*4kB 1*8kB 7*16kB 6*32kB 3*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2684kB
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] HighMem: empty
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Swap cache: add 267302, delete 267302, find 1913/4316, race 0+3
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Free swap = 0kB
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Total swap = 979956kB
Jan 18 11:29:41 laptop2 kernel: [4304399.118000] Out of Memory: Killed process 8209 (postmaster).
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] oom-killer: gfp_mask=0x1d2
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] DMA per-cpu:
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] cpu 0 hot: low 2, high 6, batch 1
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] cpu 0 cold: low 0, high 2, batch 1
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Normal per-cpu:
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] cpu 0 hot: low 62, high 186, batch 31
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] cpu 0 cold: low 0, high 62, batch 31
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] HighMem per-cpu: empty
Jan 18 11:29:41 laptop2 kernel: [4304399.141000]
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Free pages: 4716kB (0kB HighMem)
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Active:51990 inactive:52547 dirty:0 writeback:26 unstable:0 free:1179 slab:3547 mapped:102473 pagetables:741
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] DMA free:2032kB min:88kB low:108kB high:132kB active:5496kB inactive:5528kB present:16384kB pages_scanned:18629 all_unreclaimable? yes
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] lowmem_reserve[]: 0 486 486
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Normal free:2684kB min:2776kB low:3468kB high:4164kB active:202464kB inactive:204660kB present:498560kB pages_scanned:491576 all_unreclaimable? yes
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] lowmem_reserve[]: 0 0 0
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] lowmem_reserve[]: 0 0 0
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] DMA: 0*4kB 0*8kB 1*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2032kB
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Normal: 1*4kB 1*8kB 7*16kB 6*32kB 3*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2684kB
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] HighMem: empty
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Swap cache: add 267329, delete 267303, find 1913/4317, race 0+3
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Free swap = 0kB
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Total swap = 979956kB
Jan 18 11:29:41 laptop2 kernel: [4304399.141000] Out of Memory: Killed process 9294 (postmaster).

--
Steven Mooij
Koekamp 83
2623XP Delft

e steven(at)mooij(dot)name
t 015 256 8304