Re: My Experiment of PG crash when dealing with huge amount of data

Lists: pgsql-general
From: 高健 <luckyjackgao(at)gmail(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-08-30 09:10:42
Message-ID: CAL454F3M19-Qgbfj3fp8_=w3CWWy5AWGKNTP=sD33EfFDv6JnA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello:

I have done the following experiment to test :

PG's activity when dealing with data which is bigger in size than total
memory of the whole os system.

The result is:
PG says:
----------------------------
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
The connection to the server was lost. Attempting reset: Failed.
!>
-----------------------------

In log, I can see the following:
LOG: background writer process (PID 3221) was terminated by signal 9:
Killed

But why it need to kill the bgwriter ?
And that PG will use as much resource as it can to finish a sql dml , till
it used almost all the resource and then crash?

I used the default checkpoint_segments(3) and shard_buffers(32MB)
settings. And my machine has only 1024MB memory.
The process is as :
1) create table whose one tuple is more than 1KB:

postgres=# create table test01(id integer, val char(1024));

2) insert into the table 2457600 records, which makes totally more than
2400MB:

postgres=# insert into test01 values(generate_series(1,2457600),repeat(
chr(int4(random()*26)+65),1024));

It really took a few minutes, When the sql statement run, I can see that
the server process is consuming 80% of total memory of the os.

----------------------------------------------------------------------------------------------------------------------------------

[root(at)server ~]# ps aux | grep post

root 3180 0.0 0.0 105296 712 pts/1 S 16:31 0:00 su -
postgres

postgres 3181 0.0 0.0 70304 676 pts/1 S+ 16:31 0:00 -bash

postgres 3219 0.0 0.2 113644 2864 pts/1 S 16:32 0:00
/usr/local/pgsql/bin/postgres -D /gao/data

postgres 3221 0.4 3.0 113724 35252 ? Ss 16:32 0:01 postgres:
writer process

postgres 3222 0.2 0.1 113644 1616 ? Ds 16:32 0:00 postgres:
wal writer process

postgres 3223 0.0 0.0 114380 1148 ? Ss 16:32 0:00 postgres:
autovacuum launcher process

postgres 3224 0.0 0.0 73332 472 ? Ss 16:32 0:00 postgres:
stats collector process

root 3252 0.0 0.0 105296 712 pts/2 S 16:32 0:00 su -
postgres

postgres 3253 0.0 0.0 70304 676 pts/2 S 16:32 0:00 -bash

postgres 3285 0.0 0.0 83488 740 pts/2 S+ 16:32 0:00 ./psql

postgres 3286 14.8 80.2 2598332 924308 ? Ds 16:32 0:35 postgres:
postgres postgres [local] INSERT

root 3333 0.0 0.0 65424 812 pts/3 S+ 16:36 0:00 grep post
--------------------------------------------------------

After a while, I found in the PG's log , the following information:
--------------------------------------------------------

LOG: autovacuum launcher started
LOG: database system is ready to accept connections
LOG: checkpoints are occurring too frequently (2 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (1 second apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (2 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (1 second apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (2 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (1 second apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (2 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (4 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (10 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (9 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (7 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (9 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (7 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (5 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (5 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (10 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (11 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (15 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (23 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (7 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (8 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (13 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (7 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (8 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (8 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (8 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (4 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (5 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: checkpoints are occurring too frequently (9 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".
LOG: background writer process (PID 3221) was terminated by signal 9:
Killed
LOG: terminating any other active server processes
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
WARNING: terminating connection because of crash of another server process
DETAIL: The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT: In a moment you should be able to reconnect to the database and
repeat your command.
LOG: all server processes terminated; reinitializing
FATAL: the database system is in recovery mode
LOG: database system was interrupted; last known up at 2013-08-30 16:36:42
CST
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: consistent recovery state reached at 0/B7657BD0
LOG: redo starts at 0/B60FE2B8
LOG: unexpected pageaddr 0/B044C000 in log file 0, segment 184, offset
4505600
LOG: redo done at 0/B844B940
LOG: autovacuum launcher started
LOG: database system is ready to accept connections

-------------------------------------------------------------------------------------

Best Regards


From: hxreno1 <hxreno1(at)gmail(dot)com>
To: 高健 <luckyjackgao(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-08-30 09:57:00
Message-ID: 52206C6C.4050405@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

This should be the operating system OOM kills pg process,check syslog

On Fri 30 Aug 2013 05:10:42 PM CST, 高健 wrote:
> Hello:
>
> I have done the following experiment to test :
>
> PG's activity when dealing with data which is bigger in size than
> total memory of the whole os system.
>
> The result is:
> PG says:
> ----------------------------
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> The connection to the server was lost. Attempting reset: Failed.
> !>
> -----------------------------
>
> In log, I can see the following:
> LOG: background writer process (PID 3221) was terminated by signal 9:
> Killed
>
> But why it need to kill the bgwriter ?
> And that PG will use as much resource as it can to finish a sql dml ,
> till it used almost all the resource and then crash?
>
> I used the default checkpoint_segments(3) and shard_buffers(32MB)
> settings. And my machine has only 1024MB memory.
> The process is as :
> 1) create table whose one tuple is more than 1KB:
>
> postgres=# create table test01(id integer, val char(1024));
>
> 2) insert into the table 2457600 records, which makes totally more
> than 2400MB:
>
> postgres=# insert into test01
> values(generate_series(1,2457600),repeat(
> chr(int4(random()*26)+65),1024));
>
> It really took a few minutes, When the sql statement run, I can see
> that the server process is consuming 80% of total memory of the os.
>
> ----------------------------------------------------------------------------------------------------------------------------------
>
> [root(at)server ~]# ps aux | grep post
>
> root 3180 0.0 0.0 105296 712 pts/1 S 16:31 0:00 su -
> postgres
>
> postgres 3181 0.0 0.0 70304 676 pts/1 S+ 16:31 0:00 -bash
>
> postgres 3219 0.0 0.2 113644 2864 pts/1 S 16:32 0:00
> /usr/local/pgsql/bin/postgres -D /gao/data
>
> postgres 3221 0.4 3.0 113724 35252 ? Ss 16:32 0:01
> postgres: writer process
>
> postgres 3222 0.2 0.1 113644 1616 ? Ds 16:32 0:00
> postgres: wal writer process
>
> postgres 3223 0.0 0.0 114380 1148 ? Ss 16:32 0:00
> postgres: autovacuum launcher process
>
> postgres 3224 0.0 0.0 73332 472 ? Ss 16:32 0:00
> postgres: stats collector process
>
> root 3252 0.0 0.0 105296 712 pts/2 S 16:32 0:00 su -
> postgres
>
> postgres 3253 0.0 0.0 70304 676 pts/2 S 16:32 0:00 -bash
>
> postgres 3285 0.0 0.0 83488 740 pts/2 S+ 16:32 0:00 ./psql
>
> postgres 3286 14.8 80.2 2598332 924308 ? Ds 16:32 0:35
> postgres: postgres postgres [local] INSERT
>
> root 3333 0.0 0.0 65424 812 pts/3 S+ 16:36 0:00 grep post
>
> --------------------------------------------------------
>
> After a while, I found in the PG's log , the following information:
> --------------------------------------------------------
>
> LOG: autovacuum launcher started
> LOG: database system is ready to accept connections
> LOG: checkpoints are occurring too frequently (2 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (1 second apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (2 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (1 second apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (2 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (1 second apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (2 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (4 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (10 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (9 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (7 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (9 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (7 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (5 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (5 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (10 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (11 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (15 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (23 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (7 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (13 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (7 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (8 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (4 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (5 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: checkpoints are occurring too frequently (9 seconds apart)
> HINT: Consider increasing the configuration parameter
> "checkpoint_segments".
> LOG: background writer process (PID 3221) was terminated by signal 9:
> Killed
> LOG: terminating any other active server processes
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> WARNING: terminating connection because of crash of another server
> process
> DETAIL: The postmaster has commanded this server process to roll back
> the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database and
> repeat your command.
> LOG: all server processes terminated; reinitializing
> FATAL: the database system is in recovery mode
> LOG: database system was interrupted; last known up at 2013-08-30
> 16:36:42 CST
> LOG: database system was not properly shut down; automatic recovery
> in progress
> LOG: consistent recovery state reached at 0/B7657BD0
> LOG: redo starts at 0/B60FE2B8
> LOG: unexpected pageaddr 0/B044C000 in log file 0, segment 184,
> offset 4505600
> LOG: redo done at 0/B844B940
> LOG: autovacuum launcher started
> LOG: database system is ready to accept connections
>
> -------------------------------------------------------------------------------------
>
> Best Regards
>


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: 高健 <luckyjackgao(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-08-30 13:07:26
Message-ID: CAB7nPqSuLujfaF6rNHDSqiA8b_rr0reSi_mS+PenS6ueS1emvw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Aug 30, 2013 at 6:10 PM, 高健 <luckyjackgao(at)gmail(dot)com> wrote:
> In log, I can see the following:
> LOG: background writer process (PID 3221) was terminated by signal 9:
> Killed
Assuming that no users on your server manually killed this process, or
that no maintenance task you implemented did that, this looks like the
Linux OOM killer because of a memory overcommit. Have a look here for
more details:
http://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT
So have a look at dmesg to confirm that, then you could use one of the
strategies described in the docs. Also, as you have been doing a bulk
INSERT, you should as well increase temporarily checkpoint_segments to
reduce the pressure on the background writer by reducing the number of
checkpoints happening. This will also make your data load faster.
--
Michael


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: 高健 <luckyjackgao(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-08-31 20:42:29
Message-ID: CAMkU=1xDZ-yaK+mzLweqHL1wNCf0MTgh+Vpe1SwmGzzgicBXBQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Aug 30, 2013 at 2:10 AM, 高健 <luckyjackgao(at)gmail(dot)com> wrote:
>
>
> postgres=# insert into test01 values(generate_series(1,2457600),repeat(
> chr(int4(random()*26)+65),1024));

The construct "values (srf1,srf2)" will generate its entire result set
in memory up front, it will not "stream" its results to the insert
statement on the fly.

To spare memory, you would want to use something like:

insert into test01 select generate_series,
repeat(chr(int4(random()*26)+65),1024) from
generate_series(1,2457600);

Cheers,

Jeff


From: 高健 <luckyjackgao(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-09-02 01:25:43
Message-ID: CAL454F0wXbvXSmiV7qm0dvGtArbR0jp2yrgjMi1uzqm8AE0eig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

>To spare memory, you would want to use something like:

>insert into test01 select generate_series,
>repeat(chr(int4(random()*26)+65),1024) from
>generate_series(1,2457600);

Thanks a lot!

What I am worrying about is that:
If data grows rapidly, maybe our customer will use too much memory , Is
ulimit command a good idea for PG?

Best Regards

2013/9/1 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>

> On Fri, Aug 30, 2013 at 2:10 AM, 高健 <luckyjackgao(at)gmail(dot)com> wrote:
> >
> >
> > postgres=# insert into test01 values(generate_series(1,2457600),repeat(
> > chr(int4(random()*26)+65),1024));
>
> The construct "values (srf1,srf2)" will generate its entire result set
> in memory up front, it will not "stream" its results to the insert
> statement on the fly.
>
> To spare memory, you would want to use something like:
>
> insert into test01 select generate_series,
> repeat(chr(int4(random()*26)+65),1024) from
> generate_series(1,2457600);
>
> Cheers,
>
> Jeff
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: 高健 <luckyjackgao(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-09-02 02:37:07
Message-ID: 16209.1378089427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

=?UTF-8?B?6auY5YGl?= <luckyjackgao(at)gmail(dot)com> writes:
> If data grows rapidly, maybe our customer will use too much memory , Is
> ulimit command a good idea for PG?

There's no received wisdom saying that it is. There's a fairly widespread
consensus that disabling OOM kill can be a good idea, but I don't recall
that many people have tried setting specific ulimits on server processes.

Keep in mind that exceeding a ulimit would cause queries to fail outright
(whether the server was under much load or not), versus just getting
slower if the server starts to swap under too much load. I can imagine
situations where that would be considered a good tradeoff, but it's hardly
right for everyone.

regards, tom lane


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: 高健 <luckyjackgao(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-09-02 19:06:58
Message-ID: CAMkU=1xFGpLW323BCFSQYDuJDmQy54ruefci47evmAatkP8NHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao(at)gmail(dot)com> wrote:
>>To spare memory, you would want to use something like:
>
>>insert into test01 select generate_series,
>>repeat(chr(int4(random()*26)+65),1024) from
>>generate_series(1,2457600);
>
> Thanks a lot!
>
> What I am worrying about is that:
> If data grows rapidly, maybe our customer will use too much memory ,

The size of the data has little to do with it. Take your example as
an example. The database could have been nearly empty before you
started running that query. A hostile or adventurous user can craft
queries that will exhaust the server's memory without ever needing any
particular amount of data in data_directory, except maybe in the temp
tablespace.

So it is a matter of what kind of users you have, not how much data
you anticipate having on disk.

The parts of PostgreSQL that might blow up memory based on ordinary
disk-based tables are pretty well protected by shared_buffers,
temp_buffers, work_mem, maintenance_work_mem, etc. already. It is the
things that don't directly map to data already on disk which are
probably more vulnerable.

> Is
> ulimit command a good idea for PG?

I've used ulimit -v on a test server (which was intentionally used to
test things to limits of destruction), and was happy with the results.
It seemed like it would error out the offending process, or just the
offending statement, in a graceful way; rather than having random
processes other than the culprit be brutally killed by OOM, or having
the machine just swap itself into uselessness. I'd be reluctant to
use it on production just on spec that something bad *might* happen
without it, but if I started experiencing problems caused by a single
rogue process using outrageous amounts of memory, that would be one of
my first stops.

Experimentally, shared memory does count against the -v limit, and the
limit has to be set rather higher than shared_buffers, or else your
database won't even start.

Cheers,

Jeff


From: 高健 <luckyjackgao(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-09-03 01:03:45
Message-ID: CAL454F32xT_7eowWsJ5yvBVVHgaWEjnY8FeBZT6ONKr+StaaTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks, I'll consider it carefully.

Best Regards

2013/9/3 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>

> On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao(at)gmail(dot)com> wrote:
> >>To spare memory, you would want to use something like:
> >
> >>insert into test01 select generate_series,
> >>repeat(chr(int4(random()*26)+65),1024) from
> >>generate_series(1,2457600);
> >
> > Thanks a lot!
> >
> > What I am worrying about is that:
> > If data grows rapidly, maybe our customer will use too much memory ,
>
>
> The size of the data has little to do with it. Take your example as
> an example. The database could have been nearly empty before you
> started running that query. A hostile or adventurous user can craft
> queries that will exhaust the server's memory without ever needing any
> particular amount of data in data_directory, except maybe in the temp
> tablespace.
>
> So it is a matter of what kind of users you have, not how much data
> you anticipate having on disk.
>
> The parts of PostgreSQL that might blow up memory based on ordinary
> disk-based tables are pretty well protected by shared_buffers,
> temp_buffers, work_mem, maintenance_work_mem, etc. already. It is the
> things that don't directly map to data already on disk which are
> probably more vulnerable.
>
> > Is
> > ulimit command a good idea for PG?
>
> I've used ulimit -v on a test server (which was intentionally used to
> test things to limits of destruction), and was happy with the results.
> It seemed like it would error out the offending process, or just the
> offending statement, in a graceful way; rather than having random
> processes other than the culprit be brutally killed by OOM, or having
> the machine just swap itself into uselessness. I'd be reluctant to
> use it on production just on spec that something bad *might* happen
> without it, but if I started experiencing problems caused by a single
> rogue process using outrageous amounts of memory, that would be one of
> my first stops.
>
> Experimentally, shared memory does count against the -v limit, and the
> limit has to be set rather higher than shared_buffers, or else your
> database won't even start.
>
> Cheers,
>
> Jeff
>


From: 高健 <luckyjackgao(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: My Experiment of PG crash when dealing with huge amount of data
Date: 2013-09-06 08:58:00
Message-ID: CAL454F1JjxP=AqbYyQ5dYWAD8v+SpE8FzM4QpSZ48ZVRDoGmCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello:

Sorry for disturbing again.
Some of my friends told me about cgroups, So I tried it first.
I found that cgroups can work for task such as wget.
But it can't work for my postgres process.

[root(at)cent6 Desktop]# cat /etc/cgconfig.conf
#
# Copyright IBM Corporation. 2007
#
# Authors: Balbir Singh <balbir(at)linux(dot)vnet(dot)ibm(dot)com>
# This program is free software; you can redistribute it and/or modify it
# under the terms of version 2.1 of the GNU Lesser General Public License
# as published by the Free Software Foundation.
#
# This program is distributed in the hope that it would be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# See man cgconfig.conf for further details.
#
# By default, mount all controllers to /cgroup/<controller>

mount {
cpuset = /cgroup/cpuset;
cpu = /cgroup/cpu;
cpuacct = /cgroup/cpuacct;
memory = /cgroup/memory;
devices = /cgroup/devices;
freezer = /cgroup/freezer;
net_cls = /cgroup/net_cls;
blkio = /cgroup/blkio;
}

group test1 {
perm {
task{
uid=postgres;
gid=postgres;
}

admin{
uid=root;
gid=root;
}

} memory {
memory.limit_in_bytes=500M;
}
}

[root(at)cent6 Desktop]#

[root(at)cent6 Desktop]# service cgconfig status
Running
[root(at)cent6 Desktop]#

When I start postgres and run the above sql statement, It still consume too
much memory. As if cgroups does not work.

Best Regards

2013/9/3 高健 <luckyjackgao(at)gmail(dot)com>

> Thanks, I'll consider it carefully.
>
> Best Regards
>
> 2013/9/3 Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
>
>> On Sun, Sep 1, 2013 at 6:25 PM, 高健 <luckyjackgao(at)gmail(dot)com> wrote:
>> >>To spare memory, you would want to use something like:
>> >
>> >>insert into test01 select generate_series,
>> >>repeat(chr(int4(random()*26)+65),1024) from
>> >>generate_series(1,2457600);
>> >
>> > Thanks a lot!
>> >
>> > What I am worrying about is that:
>> > If data grows rapidly, maybe our customer will use too much memory ,
>>
>>
>> The size of the data has little to do with it. Take your example as
>> an example. The database could have been nearly empty before you
>> started running that query. A hostile or adventurous user can craft
>> queries that will exhaust the server's memory without ever needing any
>> particular amount of data in data_directory, except maybe in the temp
>> tablespace.
>>
>> So it is a matter of what kind of users you have, not how much data
>> you anticipate having on disk.
>>
>> The parts of PostgreSQL that might blow up memory based on ordinary
>> disk-based tables are pretty well protected by shared_buffers,
>> temp_buffers, work_mem, maintenance_work_mem, etc. already. It is the
>> things that don't directly map to data already on disk which are
>> probably more vulnerable.
>>
>> > Is
>> > ulimit command a good idea for PG?
>>
>> I've used ulimit -v on a test server (which was intentionally used to
>> test things to limits of destruction), and was happy with the results.
>> It seemed like it would error out the offending process, or just the
>> offending statement, in a graceful way; rather than having random
>> processes other than the culprit be brutally killed by OOM, or having
>> the machine just swap itself into uselessness. I'd be reluctant to
>> use it on production just on spec that something bad *might* happen
>> without it, but if I started experiencing problems caused by a single
>> rogue process using outrageous amounts of memory, that would be one of
>> my first stops.
>>
>> Experimentally, shared memory does count against the -v limit, and the
>> limit has to be set rather higher than shared_buffers, or else your
>> database won't even start.
>>
>> Cheers,
>>
>> Jeff
>>
>
>