BUG #5779: Large INSERT transaction with FOREIGN KEY constraints exhausts available RAM

Lists: pgsql-bugs
From: "Michel Alexandre Salim" <michel(dot)salim(at)cs(dot)fau(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5779: Large INSERT transaction with FOREIGN KEY constraints exhausts available RAM
Date: 2010-12-01 08:09:03
Message-ID: 201012010809.oB18933X096032@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5779
Logged by: Michel Alexandre Salim
Email address: michel(dot)salim(at)cs(dot)fau(dot)de
PostgreSQL version: 9.0.1
Operating system: Ubuntu 10.04
Description: Large INSERT transaction with FOREIGN KEY constraints
exhausts available RAM
Details:

Explanation:
I have a table with close to a billion rows that was initially imported in
csv form; as such, a lot of the columns were not normalized. For speed
reasons, I created one temporary table for each column that needs to be
normalized, and then did an INSERT with JOINs to populate the new table with
the normalized data.

If foreign key constraints are enabled, this runs out of memory (the machine
has 4GB RAM + 6 GB swap and nothing else is running that requires much RAM;
4 GB of the swap was actually added to try and debug this). With
vm.overcommit_memory set to the default 0 the OOM killer consistently kills
the process handling the query; with it set to the recommended 2 postgresql
itself aborts the query when it cannot allocate more RAM.

I tried changing the fkey constraints to DEFERRABLE hoping that that would
consume less RAM, but the same result occurs. Why should the memory usage
pattern be different when integrity checks are done as part of the
transaction (even when pushed back to the end), and in a separate
transaction?

The dataset I'm using is the Eclipse UDC dataset for 2009 --
http://archive.eclipse.org/technology/phoenix/usagedata/ -- and the problem
is independently described at this other site, with a simpler query:

http://www.jory.info/serendipity/archives/30-PostgreSQL-Out-of-Memory-with-l
arge-INSERT.html

My own invocations were
CREATE TABLE udc (id INTEGER PRIMARY KEY, eclipse_id INTEGER NOT NULL,
ac│
ion_type_id INTEGER NOT NULL, target_type_id INTEGER, source_id INTEGER NOT
NUL│
, version_id INTEGER NOT NULL, target_id INTEGER, tstamp TIMESTAMPTZ NOT
NULL,│
FOREIGN KEY (action_type_id) REFERENCES actions (id) DEFERRABLE, FOREIGN KEY
(t│
rget_type_id) REFERENCES class_types (id) DEFERRABLE, FOREIGN KEY
(source_id) R│
FERENCES classes (id) DEFERRABLE, FOREIGN KEY (version_id) REFERENCES
versions │
id) DEFERRABLE , FOREIGN KEY (target_id) REFERENCES classes (id)
DEFERRABLE); │
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "udc_pkey"
for t│
ble "udc"

CREATE TABLE

udc=# INSERT INTO udc (id, eclipse_id, action_type_id, target_type_id,
source_i│
, version_id, target_id, tstamp) SELECT old_udc.id, old_udc.eclipse_id,
old_udc│
action_type, newttype.type_id, old_udc.source, newv.v_id, newt.t,
old_udc.tstam│
FROM old_udc NATURAL JOIN newttype NATURAL JOIN newv NATURAL JOIN newt;

ERROR: out of memory

DETAIL: Failed on request of size 1048576.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michel Alexandre Salim <michel(dot)salim(at)cs(dot)fau(dot)de>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5779: Large INSERT transaction with FOREIGN KEY constraints exhausts available RAM
Date: 2010-12-14 20:44:47
Message-ID: AANLkTi=51_6R2Smdit3a-sxGt2_4iFreSeYtkgB22N4b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Wed, Dec 1, 2010 at 3:09 AM, Michel Alexandre Salim
<michel(dot)salim(at)cs(dot)fau(dot)de> wrote:
> I tried changing the fkey constraints to DEFERRABLE hoping that that would
> consume less RAM, but the same result occurs. Why should the memory usage
> pattern be different when integrity checks are done as part of the
> transaction (even when pushed back to the end), and in a separate
> transaction?

I'm just guessing here, but it may be that the trigger queue is what's
filling up all the memory. I'm guessing that a trigger event is
getting queued for each row you INSERT. But when you add the foreign
key later, it does a bulk validation of the entire table instead of
validating each individual row.

If that really is what's going on here, it's a known shortcoming of
the existing implementation which, unfortunately, no one has gotten
around to fixing (partly because it's not entirely obvious what the
design should be).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company