Re: My Experiment of PG crash when dealing with huge amount of data

From: hxreno1 <hxreno1(at)gmail(dot)com>
To: 高健 <luckyjackgao(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-08-30 09:57:00
Message-ID: 52206C6C.4050405@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

This should be the operating system OOM kills pg process,check syslog

On Fri 30 Aug 2013 05:10:42 PM CST, 高健 wrote:
> Hello:
>
> I have done the following experiment to test :
>
> PG's activity when dealing with data which is bigger in size than
> total memory of the whole os system.
>
> The result is:
> PG says:
> ----------------------------
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> The connection to the server was lost. Attempting reset: Failed.
> !>
> -----------------------------
>
> In log, I can see the following:
> LOG: background writer process (PID 3221) was terminated by signal 9:
> Killed
>
> But why it need to kill the bgwriter ?
> And that PG will use as much resource as it can to finish a sql dml ,
> till it used almost all the resource and then crash?
>
> I used the default checkpoint_segments(3) and shard_buffers(32MB)
> settings. And my machine has only 1024MB memory.
> The process is as :
> 1) create table whose one tuple is more than 1KB:
>
> postgres=# create table test01(id integer, val char(1024));
>
> 2) insert into the table 2457600 records, which makes totally more
> than 2400MB:
>
> postgres=# insert into test01
> values(generate_series(1,2457600),repeat(
> chr(int4(random()*26)+65),1024));
>
> It really took a few minutes, When the sql statement run, I can see
> that the server process is consuming 80% of total memory of the os.
>
> ----------------------------------------------------------------------------------------------------------------------------------
>
> [root(at)server ~]# ps aux | grep post
>
> root 3180 0.0 0.0 105296 712 pts/1 S 16:31 0:00 su -
> postgres
>
> postgres 3181 0.0 0.0 70304 676 pts/1 S+ 16:31 0:00 -bash
>
> postgres 3219 0.0 0.2 113644 2864 pts/1 S 16:32 0:00
> /usr/local/pgsql/bin/postgres -D /gao/data
>
> postgres 3221 0.4 3.0 113724 35252 ? Ss 16:32 0:01
> postgres: writer process
>
> postgres 3222 0.2 0.1 113644 1616 ? Ds 16:32 0:00
> postgres: wal writer process
>
> postgres 3223 0.0 0.0 114380 1148 ? Ss 16:32 0:00
> postgres: autovacuum launcher process
>
> postgres 3224 0.0 0.0 73332 472 ? Ss 16:32 0:00
> postgres: stats collector process
>
> root 3252 0.0 0.0 105296 712 pts/2 S 16:32 0:00 su -
> postgres
>
> postgres 3253 0.0 0.0 70304 676 pts/2 S 16:32 0:00 -bash
>
> postgres 3285 0.0 0.0 83488 740 pts/2 S+ 16:32 0:00 ./psql
>
> postgres 3286 14.8 80.2 2598332 924308 ? Ds 16:32 0:35
> postgres: postgres postgres [local] INSERT
>
> root 3333 0.0 0.0 65424 812 pts/3 S+ 16:36 0:00 grep post
>
> --------------------------------------------------------
>
> After a while, I found in the PG's log , the following information:
> --------------------------------------------------------
>
> LOG: autovacuum launcher started
> LOG: database system is ready to accept connections
> LOG: checkpoints are occurring too frequently (2 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (1 second apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (2 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (1 second apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (2 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (1 second apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (2 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (4 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (10 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (9 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (7 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (9 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (7 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (5 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (5 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (10 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (11 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (15 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (23 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (7 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (13 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (7 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (4 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (5 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (9 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: background writer process (PID 3221) was terminated by signal 9:
> Killed
> LOG: terminating any other active server processes
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> LOG: all server processes terminated; reinitializing
> FATAL: the database system is in recovery mode
> LOG: database system was interrupted; last known up at 2013-08-30
> 16:36:42 CST
> LOG: database system was not properly shut down; automatic recovery
> in progress
> LOG: consistent recovery state reached at 0/B7657BD0
> LOG: redo starts at 0/B60FE2B8
> LOG: unexpected pageaddr 0/B044C000 in log file 0, segment 184,
> offset 4505600
> LOG: redo done at 0/B844B940
> LOG: autovacuum launcher started
> LOG: database system is ready to accept connections
>
> -------------------------------------------------------------------------------------
>
> Best Regards
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2013-08-30 12:47:19 Re: Using of replication by initdb for both nodes?
Previous Message Andreas Joseph Krogh 2013-08-30 09:22:56 Re: Why is NULL = unbounded for rangetypes?