query failed, not enough memory on 8.3.5

Lists: pgsql-bugs
From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: pgsql-bugs(at)postgresql(dot)org
Subject: query failed, not enough memory on 8.3.5
Date: 2008-11-28 19:08:53
Message-ID: 6321D0FC-26FC-4E43-A8E3-6F4A141F00D3@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

hey folks
(please CC, I'm not subscribed).

I am trying to merge duplicates of two fields in a rather large table
(139M rows, about 7GB of data).
creating helper table, that would hold it.
my feeling is, postgresql shouldn't go beyond memory bounds I am
setting it in conf, never had any problems like it before.

Table:

staty=> \d+ stats_busy
Table "public.stats_busy"
Column | Type | Modifiers
| Description
----------------+--------------------------------+--------------------
+-------------
size_to | bigint | not null default 0 |
size_from | bigint | not null default 0 |
mac | integer | not null |
data_timestamp | timestamp(0) without time zone | not null |
data | integer | not null |
Indexes:
"d" btree (data)
"dm" btree (data, mac)
Foreign-key constraints:
"m" FOREIGN KEY (mac) REFERENCES macs(id) MATCH FULL
Has OIDs: no

conf file:
/var/lib/postgresql/8.3/main# grep -v -E "^#|^[[:space:]]+#|^$"
postgresql.conf
max_connections = 100 # (change requires restart)
shared_buffers = 32MB # min 128kB or max_connections*16kB
temp_buffers = 32MB # min 800kB
work_mem = 64MB # min 64kB
maintenance_work_mem = 32MB # min 1MB
max_fsm_pages = 4013760 # min max_fsm_relations*16, 6 bytes each
vacuum_cost_page_hit = 1 # 0-10000 credits
vacuum_cost_page_miss = 10 # 0-10000 credits
vacuum_cost_page_dirty = 20 # 0-10000 credits
vacuum_cost_limit = 200 # 1-10000 credits
checkpoint_segments = 64 # in logfile segments, min 1, 16MB each
geqo = on
geqo_threshold = 12
geqo_effort = 5 # range 1-10
geqo_pool_size = 0 # selects default based on effort
geqo_generations = 0 # selects default based on effort
geqo_selection_bias = 2.0 # range 1.5-2.0
default_statistics_target = 200 # range 1-1000
log_destination = 'stderr' # Valid values are combinations of
logging_collector = on # Enable capturing of stderr and csvlog
log_directory = 'pg_log' # directory where log files are written,
log_line_prefix = '[%t %h %d %u %p]: ' # special values:
track_activities = on
track_counts = on
update_process_title = on
autovacuum = on # Enable autovacuum subprocess? 'on'
autovacuum_naptime = 60min # time between autovacuum runs
autovacuum_vacuum_threshold = 500000 # min number of row updates before
autovacuum_analyze_threshold = 250000 # min number of row updates before
autovacuum_freeze_max_age = 200000000 # maximum XID age before forced
vacuum
datestyle = 'iso, mdy'
lc_messages = 'en_US' # locale for system error message
lc_monetary = 'en_US' # locale for monetary formatting
lc_numeric = 'en_US' # locale for number formatting
lc_time = 'en_US' # locale for time formatting
default_text_search_config = 'pg_catalog.english'

~# ulimit -a; file /usr/lib/postgresql/8.3/bin/postgres; free; grep -i
-E 'work_mem|shared_buffers|max_connections' /etc/postgresql/8.3/main/
postgresql.conf
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 16375
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 16375
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
/usr/lib/postgresql/8.3/bin/postgres: ELF 32-bit LSB executable, Intel
80386, version 1 (SYSV), for GNU/Linux 2.6.8, dynamically linked (uses
shared libs), stripped
total used free shared buffers
cached
Mem: 2071396 223888 1847508 0 6860
158284
-/+ buffers/cache: 58744 2012652
Swap: 1951888 104716 1847172
max_connections = 50
# note: increasing max_connections costs ~400 bytes of shared memory per
# might also need to raise shared_buffers to support more connections.
shared_buffers = 16000 # min 16 or max_connections*2, 8KB each
work_mem = 128140 # min 64, size in KB
maintenance_work_mem = 65535 # min 1024, size in KB
# max_locks_per_transaction * (max_connections +
max_prepared_transactions)

TopMemoryContext: 49416 total in 6 blocks; 6016 free (5 chunks); 43400
used
Type information cache: 8192 total in 1 blocks; 1800 free (0
chunks); 6392 used
Operator lookup cache: 24576 total in 2 blocks; 14072 free (6
chunks); 10504 used
TopTransactionContext: 8192 total in 1 blocks; 5560 free (0
chunks); 2632 used
MessageContext: 24576 total in 2 blocks; 12936 free (3 chunks);
11640 used
Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks);
4344 used
smgr relation table: 8192 total in 1 blocks; 2808 free (0 chunks);
5384 used
TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0
chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 896 free (0 chunks);
128 used
ExecutorState: 24576 total in 2 blocks; 14520 free (15 chunks);
10056 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
AggContext: 1981800448 total in 250 blocks; 3768 free (117
chunks); 1981796680 used
TupleHashTable: 1158668312 total in 150 blocks; 465512 free
(580 chunks); 1158202800 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816
used
CacheMemoryContext: 667472 total in 20 blocks; 217568 free (8
chunks); 449904 used
pg_shdepend_depender_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
dm: 1024 total in 1 blocks; 280 free (0 chunks); 744 used
d: 1024 total in 1 blocks; 344 free (0 chunks); 680 used
pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 240 free
(0 chunks); 784 used
pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
pg_ts_template_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free
(0 chunks); 744 used
pg_ts_parser_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_ts_config_cfgname_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0
chunks); 832 used
pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
pg_proc_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 152 free
(0 chunks); 872 used
pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free
(0 chunks); 832 used
pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 88 free
(0 chunks); 936 used
pg_namespace_oid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_language_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_language_name_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks);
680 used
pg_database_oid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_constraint_oid_index: 1024 total in 1 blocks; 344 free (0
chunks); 680 used
pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0
chunks); 744 used
pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0
chunks); 896 used
pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0
chunks); 872 used
pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_auth_members_role_member_index: 1024 total in 1 blocks; 280
free (0 chunks); 744 used
pg_auth_members_member_role_index: 1024 total in 1 blocks; 280
free (0 chunks); 744 used
pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 240 free
(0 chunks); 784 used
pg_amop_opr_fam_index: 1024 total in 1 blocks; 240 free (0
chunks); 784 used
pg_aggregate_fnoid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free
(0 chunks); 784 used
pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 280 free
(0 chunks); 744 used
pg_operator_oid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0
chunks); 936 used
pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0
chunks); 936 used
pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0
chunks); 720 used
pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free
(0 chunks); 784 used
pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks);
720 used
MdSmgr: 8192 total in 1 blocks; 6472 free (1 chunks); 1720 used
LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280
used
Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used
ErrorContext: 8192 total in 1 blocks; 8176 free (3 chunks); 16 used
[2008-11-28 20:04:14 CET [local] staty citycos 4605]: ERROR: out of
memory
[2008-11-28 20:04:14 CET [local] staty citycos 4605]: DETAIL: Failed
on request of size 20.
[2008-11-28 20:04:14 CET [local] staty citycos 4605]: STATEMENT:
create table dups as select count(*) as c, data, mac from stats_busy
group by data,mac having count(*) >1;


From: "Greg Stark" <stark(at)enterprisedb(dot)com>
To: "Grzegorz Jaskiewicz" <gj(at)pointblue(dot)com(dot)pl>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: query failed, not enough memory on 8.3.5
Date: 2008-11-30 13:31:04
Message-ID: 4136ffa0811300531i169f1fbaq98181d0156eaf573@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Nov 28, 2008 at 8:08 PM, Grzegorz Jaskiewicz
<gj(at)pointblue(dot)com(dot)pl> wrote:
> work_mem = 128140 # min 64, size in KB
> maintenance_work_mem = 65535 # min 1024, size in KB

Incidentally it's very odd to see maintenance_work_mem less than
work_mem. work_mem is used by regular queries and maintenance_work_mem
by things like create index.

> AggContext: 1981800448 total in 250 blocks; 3768 free (117 chunks);
> 1981796680 used
> TupleHashTable: 1158668312 total in 150 blocks; 465512 free (580
> chunks); 1158202800 used

Er, yeah. This is a known problem and it would be really nice to fix
it. However nobody's come up with any bright ideas for fixing it and
disabling hash aggregates doesn't seem worth it. The problem only
occurs when Postgres dramaticalyl misestimates the number of hash
buckets needed for hash aggregates.

Can you post the EXPLAIN for your query? you might have to analyze
your tables again, and possibly raise the stats target for the columns
you're aggregating.

Failing that you could set enable_hashaggregate to false for this query.

--
greg


From: Grzegorz Jaskiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Greg Stark <stark(at)enterprisedb(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: query failed, not enough memory on 8.3.5
Date: 2008-11-30 16:02:49
Message-ID: A8650FA2-4A13-4AFC-ADE8-2343313E1B3E@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


On 2008-11-30, at 13:31, Greg Stark wrote:

> On Fri, Nov 28, 2008 at 8:08 PM, Grzegorz Jaskiewicz
> <gj(at)pointblue(dot)com(dot)pl> wrote:
>> work_mem = 128140 # min 64, size in KB
>> maintenance_work_mem = 65535 # min 1024, size in KB
>
> Incidentally it's very odd to see maintenance_work_mem less than
> work_mem. work_mem is used by regular queries and maintenance_work_mem
> by things like create index.
>
>> AggContext: 1981800448 total in 250 blocks; 3768 free (117
>> chunks);
>> 1981796680 used
>> TupleHashTable: 1158668312 total in 150 blocks; 465512 free
>> (580
>> chunks); 1158202800 used
>
> Er, yeah. This is a known problem and it would be really nice to fix
> it. However nobody's come up with any bright ideas for fixing it and
> disabling hash aggregates doesn't seem worth it. The problem only
> occurs when Postgres dramaticalyl misestimates the number of hash
> buckets needed for hash aggregates.
>
> Can you post the EXPLAIN for your query? you might have to analyze
> your tables again, and possibly raise the stats target for the columns
> you're aggregating.
>
> Failing that you could set enable_hashaggregate to false for this
> query.

I already worked around this by setting work_mem to very low value,
since that query was suppose to be run only once anyway (one time
migration of schema).
Good to know that someone actually was able to analyze it, without
telling me 'man you gotta rewrite that query'.

to be honest, I thought postgresql will never fail that way, and is
able to use divide work into smaller pieces, or swap it on disc, in
case something like that happens.

Thanks again Greg.

--
Greg