Re: 8.3rc1 Out of memory when performing update

Lists: pgsql-performance
From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: 8.3rc1 Out of memory when performing update
Date: 2008-01-24 23:46:20
Message-ID: F0238EBA67824444BC1CB4700960CB480482B525@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

A simple update query, over roughly 17 million rows, populating a newly added column in a table, resulted in an out of memory error when the process memory usage reached 2GB. Could this be due to a poor choice of some configuration parameter, or is there a limit on how many rows I can update in a single statement?

Log:
...
2008-01-25 09:42:08.119 NZDT [3432]: [1-1] LOG: checkpoint starting: time
2008-01-25 09:42:08.697 NZDT [3432]: [2-1] LOG: checkpoint complete: wrote 2 buffers (0.0%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=0.218 s, sync=0.047 s, total=0.578 s
...
2008-01-25 10:44:49.011 NZDT [3824]: [1-1] LOG: connection received: host=(removed) port=3207
2008-01-25 10:44:49.042 NZDT [3824]: [2-1] LOG: connection authorized: user=postgres database=(removed)
2008-01-25 10:52:08.204 NZDT [3432]: [3-1] LOG: checkpoint starting: time
2008-01-25 10:52:39.673 NZDT [3432]: [4-1] LOG: checkpoint complete: wrote 275 buffers (6.7%); 1 transaction log file(s) added, 0 removed, 0 recycled; write=27.078 s, sync=1.485 s, total=31.407 s
2008-01-25 11:02:08.055 NZDT [3432]: [5-1] LOG: checkpoint starting: time
2008-01-25 11:02:32.759 NZDT [3432]: [6-1] LOG: checkpoint complete: wrote 222 buffers (5.4%); 0 transaction log file(s) added, 0 removed, 69 recycled; write=22.766 s, sync=0.968 s, total=24.704 s
2008-01-25 11:12:08.344 NZDT [3432]: [7-1] LOG: checkpoint starting: time
2008-01-25 11:12:38.423 NZDT [3432]: [8-1] LOG: checkpoint complete: wrote 268 buffers (6.5%); 0 transaction log file(s) added, 0 removed, 77 recycled; write=27.875 s, sync=1.312 s, total=30.094 s
2008-01-25 11:22:08.088 NZDT [3432]: [9-1] LOG: checkpoint starting: time
2008-01-25 11:22:29.526 NZDT [3432]: [10-1] LOG: checkpoint complete: wrote 188 buffers (4.6%); 0 transaction log file(s) added, 0 removed, 48 recycled; write=18.155 s, sync=1.391 s, total=21.312 s
2008-01-25 11:32:08.362 NZDT [3432]: [11-1] LOG: checkpoint starting: time
2008-01-25 11:33:21.706 NZDT [3432]: [12-1] LOG: checkpoint complete: wrote 672 buffers (16.4%); 0 transaction log file(s) added, 0 removed, 59 recycled; write=70.423 s, sync=1.562 s, total=73.375 s
2008-01-25 11:42:08.244 NZDT [3432]: [13-1] LOG: checkpoint starting: time
2008-01-25 11:42:27.010 NZDT [3432]: [14-1] LOG: checkpoint complete: wrote 175 buffers (4.3%); 0 transaction log file(s) added, 0 removed, 51 recycled; write=17.077 s, sync=1.204 s, total=18.813 s
2008-01-25 11:52:08.299 NZDT [3432]: [15-1] LOG: checkpoint starting: time
2008-01-25 11:52:33.627 NZDT [3432]: [16-1] LOG: checkpoint complete: wrote 233 buffers (5.7%); 0 transaction log file(s) added, 0 removed, 64 recycled; write=23.328 s, sync=1.468 s, total=25.391 s
TopMemoryContext: 49816 total in 6 blocks; 5656 free (6 chunks); 44160 used
RI compare cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used
RI query cache: 8192 total in 1 blocks; 5968 free (0 chunks); 2224 used
TopTransactionContext: 8192 total in 1 blocks; 7792 free (0 chunks); 400 used
Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used
Operator lookup cache: 24576 total in 2 blocks; 14072 free (6 chunks); 10504 used
MessageContext: 40960 total in 3 blocks; 19960 free (5 chunks); 21000 used
smgr relation table: 8192 total in 1 blocks; 2808 free (0 chunks); 5384 used
TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used
Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used
PortalHeapMemory: 1024 total in 1 blocks; 760 free (0 chunks); 264 used
ExecutorState: 2044715008 total in 270 blocks; 21056 free (262 chunks); 2044693952 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used
Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used
CacheMemoryContext: 667472 total in 20 blocks; 182800 free (1 chunks); 484672 used
location_ix: 1024 total in 1 blocks; 304 free (0 chunks); 720 used
... [Cut 58 indexes with very similar lines to the above, to save space]
MdSmgr: 8192 total in 1 blocks; 7240 free (0 chunks); 952 used
LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used
Timezones: 49432 total in 2 blocks; 5968 free (0 chunks); 43464 used
ErrorContext: 8192 total in 1 blocks; 8176 free (3 chunks); 16 used
2008-01-25 11:53:10.315 NZDT [3824]: [3-1] ERROR: out of memory
2008-01-25 11:53:10.362 NZDT [3824]: [4-1] DETAIL: Failed on request of size 28.
2008-01-25 11:53:10.362 NZDT [3824]: [5-1] STATEMENT: UPDATE document_file SET document_type_id = (SELECT document_type_id FROM document d where d.id = document_id);
2008-01-25 12:00:53.571 NZDT [3604]: [1-1] LOG: connection received: host=(removed) port=3399
2008-01-25 12:00:54.274 NZDT [3604]: [2-1] LOG: connection authorized: user=postgres database=(removed)
2008-01-25 12:00:55.727 NZDT [3604]: [3-1] LOG: duration: 1264.999 ms statement: SET DateStyle=ISO;SELECT oid, pg_encoding_to_char(encoding) AS encoding, datlastsysoid
FROM pg_database WHERE oid = 16466
2008-01-25 12:02:08.322 NZDT [3432]: [17-1] LOG: checkpoint starting: time
2008-01-25 12:07:03.591 NZDT [3432]: [18-1] LOG: checkpoint complete: wrote 2784 buffers (68.0%); 0 transaction log file(s) added, 0 removed, 92 recycled; write=292.488 s, sync=1.515 s, total=295.473 s
2008-01-25 12:10:07.031 NZDT [3604]: [4-1] LOG: duration: 539646.999 ms statement: select count(*) from document_file;
2008-01-25 12:12:08.048 NZDT [3432]: [19-1] LOG: checkpoint starting: time
2008-01-25 12:15:22.176 NZDT [3432]: [20-1] LOG: checkpoint complete: wrote 949 buffers (23.2%); 0 transaction log file(s) added, 0 removed, 8 recycled; write=193.097 s, sync=0.936 s, total=194.127 s

Environment:
OS: Windows XP
PostgreSQL: 8.3RC1

Non default Resource and WAL configuration settings:
shared_buffers = 32MB
max_fsm_pages = 204800
checkpoint_segments = 300
checkpoint_timeout = 10min

The previous query (not logged due to log_min_duration_statement = 500) had been:
ALTER TABLE document_file ADD document_type_id integer;

The query plan:
Seq Scan on document_file (cost=0.00..280337907.00 rows=27619541 width=617)
SubPlan
-> Index Scan using pk_document_id on document d (cost=0.00..10.12 rows=1 width=4)
Index Cond: (id = $0)

Stephen Denne

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________


From: Fernando Ike <fike(at)midstorm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-25 01:14:33
Message-ID: 20080124231433.2f312549@pmanson
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Em Fri, 25 Jan 2008 12:46:20 +1300
"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> escreveu:

> A simple update query, over roughly 17 million rows, populating a
> newly added column in a table, resulted in an out of memory error
> when the process memory usage reached 2GB. Could this be due to a
> poor choice of some configuration parameter, or is there a limit on
> how many rows I can update in a single statement?
>

I believe that it is plataform problem. Because on *nix this limit
don't occur. But I don't specialist Windows.

Kind Regards,
--
Fernando Ike
http://www.midstorm.org/~fike/weblog


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-25 02:31:14
Message-ID: 21675.1201228274@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> A simple update query, over roughly 17 million rows, populating a newly added column in a table, resulted in an out of memory error when the process memory usage reached 2GB. Could this be due to a poor choice of some configuration parameter, or is there a limit on how many rows I can update in a single statement?

Do you have any triggers or foreign keys on that table? For that
matter, let's see its whole schema definition.

regards, tom lane


From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-25 03:20:49
Message-ID: F0238EBA67824444BC1CB4700960CB480482B5B7@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > A simple update query, over roughly 17 million rows,
> > populating a newly added column in a table, resulted in an
> > out of memory error when the process memory usage reached
> > 2GB. Could this be due to a poor choice of some configuration
> > parameter, or is there a limit on how many rows I can update
> > in a single statement?
>
> Do you have any triggers or foreign keys on that table? For that
> matter, let's see its whole schema definition.
>
> regards, tom lane

No triggers on that table, one primary key, one foreign key, two indexes.
The foreign key references a primary key which is also an integer.
No other tables which reference document_file.
No inherited tables.
There are as many document_file rows as there are rows in the document table,
document_file.document_id is unique, though not constrained.
(Designed as a one to many relationship, but only ever used as one to one.)

I altered the update statement slightly, and reran the query.

I disabled autovacuum after a while and cancelled the autovacuum process that was trying to vacuum analyze document_file.

The altered query has been running over 3 hours now,
without using lots of memory (38M private bytes).
2046 temp files were created (2.54GB worth),
which have recently changed from slowly growing in size
to very very slowly reducing in number.

Altered query that has not crashed:
UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;

Hash Join (cost=674810.80..6701669.63 rows=16972702 width=621)
Hash Cond: (df.document_id = d.id)
-> Seq Scan on document_file df (cost=0.00..750298.65 rows=27702365 width=617)
-> Hash (cost=396352.02..396352.02 rows=16972702 width=8)
-> Seq Scan on document d (cost=0.00..396352.02 rows=16972702 width=8)

c.f. original (re-explained):
UPDATE document_file SET document_type_id = (SELECT document_type_id FROM document d where d.id = document_id);

Seq Scan on document_file (cost=0.00..281183329.64 rows=27702834 width=617)
SubPlan
-> Index Scan using pk_document_id on document d (cost=0.00..10.12 rows=1 width=4)
Index Cond: (id = $0)

Schema as reported by pgadmin:

CREATE TABLE document_file
(
id integer NOT NULL DEFAULT nextval(('document_file_seq'::text)::regclass),
document_id integer NOT NULL,
archive_directory_location character varying(255) NOT NULL,
mime_type character varying(255),
file_name character varying(255) NOT NULL,
life_cycle_status character varying(255),
version integer DEFAULT 0,
is_current boolean DEFAULT true,
file_size integer NOT NULL,
document_type_id integer,
CONSTRAINT pk_document_file_id PRIMARY KEY (id),
CONSTRAINT fk_document_id FOREIGN KEY (document_id)
REFERENCES document (id) MATCH SIMPLE
ON UPDATE NO ACTION ON DELETE CASCADE
)
WITH (OIDS=FALSE);
ALTER TABLE document_file OWNER TO postgres;
GRANT ALL ON TABLE document_file TO postgres;
GRANT ALL ON TABLE document_file TO vapps;
GRANT ALL ON TABLE document_file TO vrconfig;

CREATE INDEX location_ix
ON document_file
USING btree
(archive_directory_location);

CREATE INDEX tc_file_document
ON document_file
USING btree
(document_id);

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-25 04:50:25
Message-ID: 22975.1201236625@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> I altered the update statement slightly, and reran the query.
> The altered query has been running over 3 hours now,
> without using lots of memory (38M private bytes).
> 2046 temp files were created (2.54GB worth),
> which have recently changed from slowly growing in size
> to very very slowly reducing in number.

Hmm. I think what that really means is you haven't got to the part of
the query where the leak is :-(. In my attempt to reproduce this
I found that 8.3 has introduced a memory leak into the RI trigger
support, such that even if an UPDATE doesn't change the FK columns
it's still likely to leak a few dozen bytes per updated row.

Please see if the attached patch makes it better for you.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 2.3 KB

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-25 10:14:40
Message-ID: F0238EBA67824444BC1CB4700960CB48036E81E2@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I don't have a PostgreSQL build environment.

It is now Friday night for me. I left the alternate query running, and will find out on Monday what happened.

If I drop the fk constraint, and/or its index, would I still be affected by the leak you found?

Regards,
Stephen Denne.

________________________________

From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
Sent: Fri 25/01/2008 5:50 p.m.
To: Stephen Denne
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] 8.3rc1 Out of memory when performing update

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> I altered the update statement slightly, and reran the query.
> The altered query has been running over 3 hours now,
> without using lots of memory (38M private bytes).
> 2046 temp files were created (2.54GB worth),
> which have recently changed from slowly growing in size
> to very very slowly reducing in number.

Hmm. I think what that really means is you haven't got to the part of
the query where the leak is :-(. In my attempt to reproduce this
I found that 8.3 has introduced a memory leak into the RI trigger
support, such that even if an UPDATE doesn't change the FK columns
it's still likely to leak a few dozen bytes per updated row.

Please see if the attached patch makes it better for you.

regards, tom lane

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________


From: cgallant(at)gmail(dot)com
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-25 18:32:46
Message-ID: 20080125183246.GB32623@soleauthority.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

>
> > A simple update query, over roughly 17 million rows, populating a
> > newly added column in a table, resulted in an out of memory error
> > when the process memory usage reached 2GB. Could this be due to a
> > poor choice of some configuration parameter, or is there a limit on
> > how many rows I can update in a single statement?
> >
>
> I believe that it is plataform problem. Because on *nix this limit
> don't occur. But I don't specialist Windows.

On most Windows Servers(except for database edition and a few other
variants), 2Gb is the most you can address to a single
process without booting the machine with a special parameter called \3G
which will allow for allocating up to 3Gb per process. That is the
limit unless you get special versions of windows server 2003 as far as I
know. If you do a google search on \3G with windows you will find what
I am refering too.

Hope that helps a bit.

Cheers,

---
Curtis Gallant
cgallant(at)gmail(dot)com


From: "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com>
To: <cgallant(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-25 18:36:28
Message-ID: 1A6E6D554222284AB25ABE3229A927627154CE@nrtexcus702.int.asurion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> Subject: Re: [PERFORM] 8.3rc1 Out of memory when performing update
>
> >
> > > A simple update query, over roughly 17 million rows, populating a
> > > newly added column in a table, resulted in an out of memory error
> > > when the process memory usage reached 2GB. Could this be due to a
> > > poor choice of some configuration parameter, or is there a limit
on
> > > how many rows I can update in a single statement?
> > >
> >
> > I believe that it is plataform problem. Because on *nix this
limit
> > don't occur. But I don't specialist Windows.
>
> On most Windows Servers(except for database edition and a few other
> variants), 2Gb is the most you can address to a single
> process without booting the machine with a special parameter called
\3G
> which will allow for allocating up to 3Gb per process. That is the
> limit unless you get special versions of windows server 2003 as far as
I
> know. If you do a google search on \3G with windows you will find
what
> I am refering too.

Windows 32 bit is limited to 2 or 3 GB as you state but 64 bit Windows
isn't. 32 bit Linux has similar limits too.

Jon


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: "Roberts, Jon" <Jon(dot)Roberts(at)asurion(dot)com>
Cc: cgallant(at)gmail(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-25 18:45:43
Message-ID: 479A2E57.3060405@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Roberts, Jon wrote:
>> Subject: Re: [PERFORM] 8.3rc1 Out of memory when performing update
>>
>>>> A simple update query, over roughly 17 million rows, populating a
>>>> newly added column in a table, resulted in an out of memory error
>>>> when the process memory usage reached 2GB. Could this be due to a
>>>> poor choice of some configuration parameter, or is there a limit
> on
>>>> how many rows I can update in a single statement?
>>>>
>>> I believe that it is plataform problem. Because on *nix this
> limit
>>> don't occur. But I don't specialist Windows.
>> On most Windows Servers(except for database edition and a few other
>> variants), 2Gb is the most you can address to a single
>> process without booting the machine with a special parameter called
> \3G
>> which will allow for allocating up to 3Gb per process. That is the
>> limit unless you get special versions of windows server 2003 as far as
> I
>> know. If you do a google search on \3G with windows you will find
> what
>> I am refering too.
>
> Windows 32 bit is limited to 2 or 3 GB as you state but 64 bit Windows
> isn't. 32 bit Linux has similar limits too.

Well, PostgreSQL on Windows is a 32-bit binary, so the limit applies to
this case.

//Magnus


From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-27 21:55:10
Message-ID: F0238EBA67824444BC1CB4700960CB480482B748@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

>>"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
>>> I altered the update statement slightly, and reran the query.
>>> The altered query has been running over 3 hours now,
>>> without using lots of memory (38M private bytes).
>>> 2046 temp files were created (2.54GB worth),
>>> which have recently changed from slowly growing in size
>>> to very very slowly reducing in number.

To which Tom Lane replied:
>>Hmm. I think what that really means is you haven't got to the part of
>>the query where the leak is :-(.

I then said:
> It is now Friday night for me. I left the alternate query running, and will find out on Monday what happened.

Well, it is now Monday morning for me, and those temp files are still slowly reducing in number.
There are now only 1629 of them left, so I'm guessing that the query is about 20% done.
The PC seems to have been steadily but very slowly working away at this very simple query for close to 70 hours.
I decided not to leave this query running for a fortnight to find out if I then strike the memory leak.
Private Bytes had grown to 685MB
I cancelled the query.

Rough snapshot of what was happening with IO (a single 7200 IDE disk):

The process for the update query was reading about 500KB/second , writing between 80KB/second to 200KB/second.
The stats collector process was writing about 100KB/second
The wal writer process was writing about 200KB/second
The writer process was writing about 400KB/second
Checkpoints were 10 minutes apart, taking about 85 seconds to write 1000 buffers.

What could cause such poor performance?
I presume that the disk was being forced to move the head a great deal.

I also asked:
> If I drop the fk constraint, and/or its index, would I still be affected by the leak you found?

I dropped two indexes and one fk constraint and ran VACUUM FULL VERBOSE ANALYZE document_file;
As an indication of the disk performance: at its peak the vacuum process was reading and writing 20MB/seconds (sustained), completing in less than 11 minutes.

I reran the original query.
It used constant memory (6.7MB private bytes)
It was reading 2 to 3MB/second, writing 3 to 6MB/second.
The stats collector process was writing about 100KB/second
The wal writer process was writing about 200KB/second
The writer process was initially writing about 1MB/second, increasing to about 3MB/second

Checkpoints in the middle of this query were taking up to 13 seconds to write 100 buffers.

The checkpoint after the query took 300 seconds (exactly half the checkpoint interval), and was writing about 60KB/second. It wrote 2148 buffers.

So dropping the fk constraint and index results in successful query execution with constant memory usage. Does this confirm that the memory leak you found is the one I was suffering from?

I'd also class the extremely poor performance of the alternate query as a bug.
Why take a fortnight when you could take three quarters of an hour? (Granted there where two less indexes to update, but that is still too long.)

Aside: I must say that I am impressed with PostgreSQL's handling of this connection. It recovers extremely well from running out of memory, cancelling very long running queries, reloading config (to turn autovacuum off), and continues to work as expected (the 3 day old connection that is).

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-28 04:16:20
Message-ID: 17436.1201493780@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> So dropping the fk constraint and index results in successful query execution with constant memory usage. Does this confirm that the memory leak you found is the one I was suffering from?

Well, it confirms that you were suffering from that memory leak. What's
not clear is whether the leak completely explains the bad performance
you saw. The leak would have resulted in extra swapping, but I wouldn't
have thought it would drive the machine completely into swap hell.
Would the monitoring tools you were using have shown swapping I/O?

If you want to try a focused experiment, please apply the patch shown
here:
http://archives.postgresql.org/pgsql-committers/2008-01/msg00322.php
and see if the behavior gets better.

> I'd also class the extremely poor performance of the alternate query as a bug.

Yeah, we should look into that. The plan you showed before estimated
about 16.9M rows in "document" --- is that about right? What have you
got work_mem set to?

> Aside: I must say that I am impressed with PostgreSQL's handling of this connection. It recovers extremely well from running out of memory, cancelling very long running queries, reloading config (to turn autovacuum off), and continues to work as expected (the 3 day old connection that is).

Hey, we've had plenty of opportunity to improve the system's robustness
in the face of such things ;-)

regards, tom lane


From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-28 05:04:00
Message-ID: F0238EBA67824444BC1CB4700960CB48048E0ABA@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Tom Lane wrote:
> "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz> writes:
> > So dropping the fk constraint and index results in
> successful query execution with constant memory usage. Does
> this confirm that the memory leak you found is the one I was
> suffering from?
>
> Well, it confirms that you were suffering from that memory
> leak. What's
> not clear is whether the leak completely explains the bad performance
> you saw. The leak would have resulted in extra swapping, but
> I wouldn't
> have thought it would drive the machine completely into swap hell.

The query crashing from out of memory did so after an hour,
which isn't bad performance given the workaround with less indexes to update succeeded after 45 minutes.

It was the rewritten one which I killed after 3 days.

> Would the monitoring tools you were using have shown swapping I/O?

I was using Process Explorer, which shows page faults and deltas,
which are not included in the read & write IO stats.
The query with poor IO performance wasn't swapping.

> > I'd also class the extremely poor performance of the
> alternate query as a bug.
>
> Yeah, we should look into that. The plan you showed before estimated
> about 16.9M rows in "document" --- is that about right? What have you
> got work_mem set to?

Yes, 16894164 rows.
Exactly the same number of rows in document as in document_file.
[count(*) queries taking 38 and 63 seconds]

work_mem appears to be left as the default 1MB

I get 1023 temp files created straight away, which take four minutes (250s) to grow to about 448K each
(reading @ 5MB/s writing @ 2MB/s)
memory usage during this first phase slowly increased from 13.4M to 14.4M
then 1023 more temp files are created, and they grow to about 2170K each
(reading @ 2MB/s writing @ 2MB/s until the checkpoint starts, when the speed decreases to 200K/s, and doesn't increase again after the checkpoint finishes.)
memory usage during this first phase slowly increased from 22.5M to 26.4M
My concern is with what it then does. (Spends a fortnight doing really slow IO)

An hour's worth of logs from during this phase show 6 checkpoints, and 6 temp files reported (which seems to coincide with them being deleted):

2008-01-26 06:02:08.086 NZDT [3432]: [233-1] LOG: checkpoint starting: time
2008-01-26 06:03:28.916 NZDT [3432]: [234-1] LOG: checkpoint complete: wrote 899 buffers (21.9%); 0 transaction log file(s) added, 0 removed, 11 recycled; write=77.798 s, sync=2.750 s, total=80.830 s
2008-01-26 06:12:08.094 NZDT [3432]: [235-1] LOG: checkpoint starting: time
2008-01-26 06:12:23.407 NZDT [3824]: [209-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1321", size 2224520
2008-01-26 06:12:23.407 NZDT [3824]: [210-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:12:24.157 NZDT [3824]: [211-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.477", size 461356
2008-01-26 06:12:24.157 NZDT [3824]: [212-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:13:21.876 NZDT [3432]: [236-1] LOG: checkpoint complete: wrote 724 buffers (17.7%); 0 transaction log file(s) added, 0 removed, 17 recycled; write=71.500 s, sync=2.108 s, total=73.781 s
2008-01-26 06:22:08.024 NZDT [3432]: [237-1] LOG: checkpoint starting: time
2008-01-26 06:23:25.415 NZDT [3432]: [238-1] LOG: checkpoint complete: wrote 877 buffers (21.4%); 0 transaction log file(s) added, 0 removed, 11 recycled; write=74.141 s, sync=2.985 s, total=77.391 s
2008-01-26 06:29:36.311 NZDT [3824]: [213-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1350", size 2220990
2008-01-26 06:29:36.311 NZDT [3824]: [214-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:29:36.982 NZDT [3824]: [215-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.516", size 463540
2008-01-26 06:29:36.982 NZDT [3824]: [216-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:32:08.016 NZDT [3432]: [239-1] LOG: checkpoint starting: time
2008-01-26 06:33:19.501 NZDT [3432]: [240-1] LOG: checkpoint complete: wrote 872 buffers (21.3%); 0 transaction log file(s) added, 0 removed, 15 recycled; write=69.062 s, sync=2.171 s, total=71.484 s
2008-01-26 06:42:08.101 NZDT [3432]: [241-1] LOG: checkpoint starting: time
2008-01-26 06:43:27.431 NZDT [3432]: [242-1] LOG: checkpoint complete: wrote 813 buffers (19.8%); 0 transaction log file(s) added, 0 removed, 14 recycled; write=76.579 s, sync=2.592 s, total=79.329 s
2008-01-26 06:46:45.558 NZDT [3824]: [217-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.1940", size 2229130
2008-01-26 06:46:45.558 NZDT [3824]: [218-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:46:46.246 NZDT [3824]: [219-1] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp3824.631", size 459564
2008-01-26 06:46:46.246 NZDT [3824]: [220-1] STATEMENT: UPDATE ONLY document_file AS df SET document_type_id = d.document_type_id FROM document AS d WHERE d.id = document_id;
2008-01-26 06:52:08.078 NZDT [3432]: [243-1] LOG: checkpoint starting: time
2008-01-26 06:53:31.173 NZDT [3432]: [244-1] LOG: checkpoint complete: wrote 983 buffers (24.0%); 0 transaction log file(s) added, 0 removed, 13 recycled; write=78.203 s, sync=4.641 s, total=83.094 s

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>, pgsql-performance(at)postgresql(dot)org
Subject: Re: 8.3rc1 Out of memory when performing update
Date: 2008-01-28 08:18:08
Message-ID: 1d4e0c10801280018x17103cc8o7ead9fdc75d69470@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Jan 25, 2008 5:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hmm. I think what that really means is you haven't got to the part of
> the query where the leak is :-(. In my attempt to reproduce this
> I found that 8.3 has introduced a memory leak into the RI trigger
> support, such that even if an UPDATE doesn't change the FK columns
> it's still likely to leak a few dozen bytes per updated row.
>
> Please see if the attached patch makes it better for you.

Just FYI, somebody on #postgresql had the exact same problem of
memleaks during update yesterday and your patch fixed it for him too.

--
Guillaume