Re: nested hstore - large insert crashes server

Lists: pgsql-hackers
From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Cc: teodor(at)sigaev(dot)ru, obartunov(at)gmail(dot)com, andrew(at)dunslane(dot)net
Subject: nested hstore - large insert crashes server
Date: 2014-02-04 16:41:06
Message-ID: d409bac48eb9a6d20c73c277afcd8bbb.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

CentOS Release 6.5 (final)
AMD FX(tm)-8120 Eight-Core
2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
memory: 8GB

I am testing nested hstore, on a server with both with these patches:
jsonb-9.patch.gz
nested-hstore-9.patch.gz

One of the first tries brings down the server (gobbles up too much memory, I think).

When I run:

--------------------
select version();

drop table if exists t_10000000 ;
create table t_10000000 (
id serial
, hs hstore primary key
, hs2 hstore
);
insert into t_10000000 (hs,hs2)
select
( '[' || i || ',' || i|| ']' ) ::hstore
, ( '{' || i || '=>' || i|| '}' ) ::hstore
from generate_series(1, 10000000) as f(i)
;
---------------------

I get:

$ time psql -af nestedhs.sql
\timing on
Timing is on.
select version();
version
--------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4devel_nested_hstore_20140204_0814_00d4f2a on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.2, 64-bit
(1 row)

Time: 1.288 ms
drop table if exists t_10000000 ;
DROP TABLE
Time: 0.808 ms
create table t_10000000 (
id serial
, hs hstore primary key
, hs2 hstore
);
CREATE TABLE
Time: 111.397 ms
insert into t_10000000 (hs,hs2)
select
( '[' || i || ',' || i|| ']' ) ::hstore
, ( '{' || i || '=>' || i|| '}' ) ::hstore
from generate_series(1, 10000000) as f(i)
;
[psql:nestedhs.sql:14: connection to server was lost

real 5m4.780s
user 0m0.005s
sys 0m0.009s

logging:

2014-02-04 10:34:25.376 CET 29133 LOG: server process (PID 29459) was terminated by signal 9: Killed
2014-02-04 10:34:25.854 CET 29133 DETAIL: Failed process was running: insert into t_10000000 (hs,hs2)
select
( '[' || i || ',' || i|| ']' ) ::hstore
, ( '{' || i || '=>' || i|| '}' ) ::hstore
from generate_series(1, 10000000) as f(i)
;
2014-02-04 10:34:25.884 CET 29133 LOG: terminating any other active server processes
2014-02-04 10:34:28.541 CET 29133 LOG: all server processes terminated; reinitializing
2014-02-04 10:34:30.002 CET 29534 LOG: database system was interrupted; last known up at 2014-02-04 10:30:42 CET
2014-02-04 10:34:30.933 CET 29535 FATAL: the database system is in recovery mode
2014-02-04 10:34:31.150 CET 29534 LOG: database system was not properly shut down; automatic recovery in progress
2014-02-04 10:34:31.344 CET 29534 LOG: redo starts at 1/B1CC92F8
2014-02-04 10:34:46.681 CET 29534 LOG: unexpected pageaddr 1/86F4A000 in log segment 0000000100000001000000CC, offset
16031744
2014-02-04 10:34:46.681 CET 29534 LOG: redo done at 1/CCF49F50
2014-02-04 10:34:52.039 CET 29133 LOG: database system is ready to accept connections

(and btw, I end up with a large but unusable table:

testdb=# \dt+ t_10000000
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------------+-------+----------+--------+-------------
public | t_10000000 | table | aardvark | 291 MB |
(1 row)

testdb=# select count(*) from t_10000000;
count
-------
0
(1 row)

)

Main .conf settings:

setting | current_setting
--------------------------+----------------------------------------------
autovacuum | off
port | 6541
shared_buffers | 512MB
effective_cache_size | 2GB
work_mem | 50MB
maintenance_work_mem | 1GB
checkpoint_segments | 20
server_version | 9.4devel_nested_hstore_20140204_0814_00d4f2a
pg_postmaster_start_time | 2014-02-04 10:12 (uptime: 0d 0h 33m 42s)
data_checksums | off

feature_id | feature_name | is_supported | is_verified_by | comments
------------+--------------------+--------------+----------------+-------------------------------------------------------------------------------------
PKG100 | project name | YES | ej | nested_hstore
PKG101 | patched | YES | ej | YES
PKG102 | patch file | YES | ej |
/home/aardvark/download/pgpatches/0094/nested_hstore/20140130/jsonb-9.patch +
| | | |
/home/aardvark/download/pgpatches/0094/nested_hstore/20140130/nested-hstore-9.patch
PKG103 | build time | YES | ej | 2014-02-04 08:19:13.600371+01
PKG104 | server_version | YES | ej | 9.4devel_nested_hstore_20140204_0814_00d4f2a
PKG105 | server_version_num | YES | ej | 90400
PKG106 | port | YES | ej | 6541
PKG110 | commit hash | YES | ej | 00d4f2af8bd6a1b9db2f676cc76b64d98ace99fb
PKG111 | catversion | YES | ej | 201402031
PKG112 | control version | YES | ej | 937
PKG113 | xlog_page_magic | YES | ej | 0xD07B

I suppose for larger machines the crash will happen later; on this small machine I can generate_series a 1 million row
table without crashing.

Thanks,

Erik Rijkers


From: Christian Kruse <christian(at)2ndQuadrant(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, obartunov(at)gmail(dot)com, andrew(at)dunslane(dot)net
Subject: Re: nested hstore - large insert crashes server
Date: 2014-02-04 17:56:09
Message-ID: 20140204175609.GA11142@defunct.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 04/02/14 17:41, Erik Rijkers wrote:

> 2014-02-04 10:34:25.376 CET 29133 LOG: server process (PID 29459) was terminated by signal 9: Killed

Did you check if this was the OOM killer? Should be logged in dmesg.

Best regards,

--
Christian Kruse http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Christian Kruse" <christian(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, obartunov(at)gmail(dot)com, andrew(at)dunslane(dot)net
Subject: Re: nested hstore - large insert crashes server
Date: 2014-02-04 18:51:53
Message-ID: 1ce2630f452781b17a9fc565adc65af8.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, February 4, 2014 18:56, Christian Kruse wrote:
> Hi,
>
> On 04/02/14 17:41, Erik Rijkers wrote:
>
>> 2014-02-04 10:34:25.376 CET 29133 LOG: server process (PID 29459) was terminated by signal 9: Killed
>
> Did you check if this was the OOM killer? Should be logged in dmesg.
>

I would be surprised if it wasn't. (no access to that machine at the moment)

How do we regard such crashes? It seems to me this was rather eaasily 'provoked' (for want of a better word).

I am inclined to blame the patch...


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Erik Rijkers" <er(at)xs4all(dot)nl>
Cc: "Christian Kruse" <christian(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, obartunov(at)gmail(dot)com, andrew(at)dunslane(dot)net
Subject: Re: nested hstore - large insert crashes server
Date: 2014-02-04 19:14:59
Message-ID: 27718.1391541299@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Erik Rijkers" <er(at)xs4all(dot)nl> writes:
> On Tue, February 4, 2014 18:56, Christian Kruse wrote:
>> Did you check if this was the OOM killer? Should be logged in dmesg.

> I would be surprised if it wasn't. (no access to that machine at the moment)

> How do we regard such crashes? It seems to me this was rather eaasily 'provoked' (for want of a better word).

Well, it suggests that there may be a memory leak, which would be a bug
even though we'd assign it lower priority than a true crash.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Christian Kruse <christian(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, teodor(at)sigaev(dot)ru, obartunov(at)gmail(dot)com, andrew(at)dunslane(dot)net
Subject: Re: nested hstore - large insert crashes server
Date: 2014-02-04 19:17:06
Message-ID: 20140204191706.GU2921@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Erik Rijkers (er(at)xs4all(dot)nl) wrote:
> On Tue, February 4, 2014 18:56, Christian Kruse wrote:
> > On 04/02/14 17:41, Erik Rijkers wrote:
> >> 2014-02-04 10:34:25.376 CET 29133 LOG: server process (PID 29459) was terminated by signal 9: Killed
> >
> > Did you check if this was the OOM killer? Should be logged in dmesg.
>
> I would be surprised if it wasn't. (no access to that machine at the moment)
>
> How do we regard such crashes? It seems to me this was rather eaasily 'provoked' (for want of a better word).
>
> I am inclined to blame the patch...

It sounds like there is at least some investigation which should happen
here to see why we're using so much memory (well beyond work_mem and
even maint_work_mem it sounds like), but it would also be good to have
the machine reconfigured to not allow OOM killing.

Thanks,

Stephen