Re: BUG #4204: COPY to table with FK has memory leak

Lists: pgsql-bugspgsql-hackers
From: "Tomasz Rybak" <bogomips(at)post(dot)pl>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-27 19:14:23
Message-ID: 200805271914.m4RJEN0r012867@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


The following bug has been logged online:

Bug reference: 4204
Logged by: Tomasz Rybak
Email address: bogomips(at)post(dot)pl
PostgreSQL version: 8.3.1
Operating system: Debian Linux
Description: COPY to table with FK has memory leak
Details:

I tried to use COPY to import 27M rows to table:
CREATE TABLE sputnik.sputnik (
id INTEGER,
sequence BIGINT,
strength INTEGER,
time TIMESTAMP WITH TIME ZONE,
tags TEXT ARRAY[1]
);
CREATE TABLE sputnik.ccc24 (
station CHARACTER(4) NOT NULL REFERENCES sputnik.station24 (id),
moment INTEGER NOT NULL,
flags INTEGER NOT NULL
) INHERITS (sputnik.sputnik);
COPY sputnik.ccc24(id, moment, station, strength, sequence, flags)
FROM '/tmp/24c3' WITH DELIMITER AS ' ';

When importing on amd64 with !G RAM and 1G swap it takes entire RAM+swap and
gets killed by OOM-killer.
After dividing data into three parts I was able to
import it.
Kernel kills server process, not psql.

On i386 with 512M RAM and !G swap it works, but
takes entire RAM and about 700MB of swap.
Amount of used swap changes, and oscilates between
600M and 900M for almost entire time of import.

So on 32-bit system, where pointers are smaller,
I was able to import, on 64-bit PostreSQL gets killed.

When I dropped FK from sputnik.ccc24, used COPY, and then created FK
everything was correct, and PostgreSQL didn't even touched swap.

In January, on 8.3rc2 I had also problem with the same data set; when I
tried to update:
UPDATE sputnik.ccc24 SET time = to_timestamp(moment);
my PostgreSQL server also was killed by OOM-killer.
Then patch from Tom Lane for ri_FetchConstraintInfo in
src/backend/utils/adt/ri_triggers.c (1.102->1.102) helped.

I think that here situation may be similar.
I even tried to write some patch:
--- src/backend/executor/execMain.c 2008-02-03 12:12:01.000000000 +0100
+++ src/backend/executor/execMain.c 2008-02-03 12:13:09.000000000 +0100
@@ -1905,6 +1905,7 @@
return check[i].ccname;
}

+ FreeExprContext(econtext);
/* NULL result means no error */
return NULL;
}

but after this patch 59 regression tests failed.

But I still think that some code responsible for checking constraints do not
frees usused memory.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Tomasz Rybak" <bogomips(at)post(dot)pl>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-27 21:36:39
Message-ID: 13982.1211924199@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

"Tomasz Rybak" <bogomips(at)post(dot)pl> writes:
> I tried to use COPY to import 27M rows to table:
> CREATE TABLE sputnik.ccc24 (
> station CHARACTER(4) NOT NULL REFERENCES sputnik.station24 (id),
> moment INTEGER NOT NULL,
> flags INTEGER NOT NULL
> ) INHERITS (sputnik.sputnik);
> COPY sputnik.ccc24(id, moment, station, strength, sequence, flags)
> FROM '/tmp/24c3' WITH DELIMITER AS ' ';

This is expected to take lots of memory because each row-requiring-check
generates an entry in the pending trigger event list. Even if you had
not exhausted memory, the actual execution of the retail checks would
have taken an unreasonable amount of time. The recommended way to do
this sort of thing is to add the REFERENCES constraint *after* you load
all the data; that'll be a lot faster in most cases because the checks
are done "in bulk" using a JOIN rather than one-at-a-time.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-28 18:22:36
Message-ID: 873ao2qoer.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


[moving to -hackers]

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> "Tomasz Rybak" <bogomips(at)post(dot)pl> writes:
>> I tried to use COPY to import 27M rows to table:
>> CREATE TABLE sputnik.ccc24 (
>> station CHARACTER(4) NOT NULL REFERENCES sputnik.station24 (id),
>> moment INTEGER NOT NULL,
>> flags INTEGER NOT NULL
>> ) INHERITS (sputnik.sputnik);
>> COPY sputnik.ccc24(id, moment, station, strength, sequence, flags)
>> FROM '/tmp/24c3' WITH DELIMITER AS ' ';
>
> This is expected to take lots of memory because each row-requiring-check
> generates an entry in the pending trigger event list. Even if you had
> not exhausted memory, the actual execution of the retail checks would
> have taken an unreasonable amount of time. The recommended way to do
> this sort of thing is to add the REFERENCES constraint *after* you load
> all the data; that'll be a lot faster in most cases because the checks
> are done "in bulk" using a JOIN rather than one-at-a-time.

Hm, it occurs to me that we could still do a join against the pending event
trigger list... I wonder how feasible it would be to store the pending trigger
event list in a temporary table instead of in ram.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's On-Demand Production Tuning


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-28 20:28:27
Message-ID: 5183.1212006507@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> This is expected to take lots of memory because each row-requiring-check
>> generates an entry in the pending trigger event list.

> Hm, it occurs to me that we could still do a join against the pending event
> trigger list... I wonder how feasible it would be to store the pending trigger
> event list in a temporary table instead of in ram.

We could make that list spill to disk, but the problem remains that
verifying the rows one at a time will take forever.

The idea that's been kicked around occasionally is that once you get
past N pending events, throw them all away and instead queue a single
operation to do a bulk verify (just like initial establishment of the
FK constraint). I'm not sure how to do the queue management for this
though.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-28 20:47:35
Message-ID: 87d4n6p34o.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>>> This is expected to take lots of memory because each row-requiring-check
>>> generates an entry in the pending trigger event list.
>
>> Hm, it occurs to me that we could still do a join against the pending event
>> trigger list... I wonder how feasible it would be to store the pending trigger
>> event list in a temporary table instead of in ram.
>
> We could make that list spill to disk, but the problem remains that
> verifying the rows one at a time will take forever.

Well I was thinking if we did a join between a temporary table and the fk
target then it wouldn't have to be a one-by-one operation. It could be a merge
join if the planner thought that was better. How to get accurate stats into
the planner at that point would be a missing detail though.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-28 21:45:37
Message-ID: 1212011137.4489.700.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


On Wed, 2008-05-28 at 16:28 -0400, Tom Lane wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> >> This is expected to take lots of memory because each row-requiring-check
> >> generates an entry in the pending trigger event list.
>
> > Hm, it occurs to me that we could still do a join against the pending event
> > trigger list... I wonder how feasible it would be to store the pending trigger
> > event list in a temporary table instead of in ram.
>
> We could make that list spill to disk, but the problem remains that
> verifying the rows one at a time will take forever.
>
> The idea that's been kicked around occasionally is that once you get
> past N pending events, throw them all away and instead queue a single
> operation to do a bulk verify (just like initial establishment of the
> FK constraint). I'm not sure how to do the queue management for this
> though.

Neither of those approaches is really suitable. Just spilling to disk is
O(N) of the number of rows loaded, the second one is O(N) at least on
the number of rows (loaded + existing). The second one doesn't help
either since if the table was empty you'd have added the FK afterwards,
so we must assume there is already rows in there and in most cases rows
already loaded will exceed those being added by the bulk operation.

AFAICS we must aggregate the trigger checks. We would need a special
property of triggers that allowed them to be aggregated when two similar
checks arrived. We can then use hash aggregation to accumulate them. We
might conceivably need to spill to disk also, since the aggregation may
not always be effective. But in most cases the tables against which FK
checks are made are significantly smaller than the tables being loaded.
Once we have hash aggregated them, that is then the first part of a hash
join to the target table.

We certainly need a TODO item for "improve RI checks during bulk
operations".

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-28 22:17:23
Message-ID: 87prr6nkek.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:

> AFAICS we must aggregate the trigger checks. We would need a special
> property of triggers that allowed them to be aggregated when two similar
> checks arrived. We can then use hash aggregation to accumulate them. We
> might conceivably need to spill to disk also, since the aggregation may
> not always be effective. But in most cases the tables against which FK
> checks are made are significantly smaller than the tables being loaded.
> Once we have hash aggregated them, that is then the first part of a hash
> join to the target table.

Well we can't aggregate them as they're created because later modifications
could delete or update the original records. The SQL spec requires that FK
checks be effective at the end of the command.

I admit off the top of my head I can't actually come up with any situations
which would be covered by the spec. All the instances I can think of involve
either Postgres's UPDATE FROM or plpgsql functions or some other postgres
specific functionality. But I do seem to recall there were some situations
where it mattered.

But we could aggregate them when it comes time to actually check them. Or we
could hash the FK keys and scan the event list. Or we could sort the two and
merge join them....

> We certainly need a TODO item for "improve RI checks during bulk
> operations".

I have a feeling it's already there. Hm. There's a whole section on RI
triggers but the closest I see is this, neither of the links appear to refer
to bulk operations:

Optimize referential integrity checks

http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00744.php

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-28 22:42:06
Message-ID: 10598.1212014526@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>> We certainly need a TODO item for "improve RI checks during bulk
>> operations".

> I have a feeling it's already there. Hm. There's a whole section on RI
> triggers but the closest I see is this, neither of the links appear to refer
> to bulk operations:

> Optimize referential integrity checks
> http://archives.postgresql.org/pgsql-performance/2005-10/msg00458.php
> http://archives.postgresql.org/pgsql-hackers/2007-04/msg00744.php

No, both of those are talking about the same thing, ie, (1) making the
are-the-keys-unchanged optimization work when NULLs are present,
and (2) not testing for this case twice.

There's an entry in the Triggers section

* Add deferred trigger queue file

Right now all deferred trigger information is stored in backend
memory. This could exhaust memory for very large trigger queues.
This item involves dumping large queues into files.

but as already noted, this is a pretty myopic answer (at least for
RI triggers).

regards, tom lane


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-29 05:41:44
Message-ID: 1212039704.4489.724.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


On Wed, 2008-05-28 at 18:17 -0400, Gregory Stark wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>
> > AFAICS we must aggregate the trigger checks. We would need a special
> > property of triggers that allowed them to be aggregated when two similar
> > checks arrived. We can then use hash aggregation to accumulate them. We
> > might conceivably need to spill to disk also, since the aggregation may
> > not always be effective. But in most cases the tables against which FK
> > checks are made are significantly smaller than the tables being loaded.
> > Once we have hash aggregated them, that is then the first part of a hash
> > join to the target table.
>
> Well we can't aggregate them as they're created because later modifications
> could delete or update the original records. The SQL spec requires that FK
> checks be effective at the end of the command.

Well, thats what we need to do. We just need to find a way...

Currently, we store trigger entries by htid. I guess we need to
aggregate them on the actual values looked up.

The SQL spec also says that the contents of the FK check table should be
taken as at the start of the command, so we should be safe to aggregate
the values prior to the check.

As already suggested in work on Read Only Tables, we could optimise them
away to being constraint checks.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-05-29 06:44:34
Message-ID: 1212043474.7129.3.camel@huvostro
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Wed, 2008-05-28 at 22:45 +0100, Simon Riggs wrote:
> On Wed, 2008-05-28 at 16:28 -0400, Tom Lane wrote:
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> > > "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> > >> This is expected to take lots of memory because each row-requiring-check
> > >> generates an entry in the pending trigger event list.
> >
> > > Hm, it occurs to me that we could still do a join against the pending event
> > > trigger list... I wonder how feasible it would be to store the pending trigger
> > > event list in a temporary table instead of in ram.
> >
> > We could make that list spill to disk, but the problem remains that
> > verifying the rows one at a time will take forever.
> >
> > The idea that's been kicked around occasionally is that once you get
> > past N pending events, throw them all away and instead queue a single
> > operation to do a bulk verify (just like initial establishment of the
> > FK constraint). I'm not sure how to do the queue management for this
> > though.
>
> Neither of those approaches is really suitable. Just spilling to disk is
> O(N) of the number of rows loaded, the second one is O(N) at least on
> the number of rows (loaded + existing). The second one doesn't help
> either since if the table was empty you'd have added the FK afterwards,
> so we must assume there is already rows in there and in most cases rows
> already loaded will exceed those being added by the bulk operation.
>
> AFAICS we must aggregate the trigger checks. We would need a special
> property of triggers that allowed them to be aggregated when two similar
> checks arrived. We can then use hash aggregation to accumulate them. We
> might conceivably need to spill to disk also, since the aggregation may
> not always be effective.

Can't we just do the checks for the FKs accumulated at the point they
don't fit in memory, instead of spilling to disk ?

> But in most cases the tables against which FK
> checks are made are significantly smaller than the tables being loaded.
> Once we have hash aggregated them, that is then the first part of a hash
> join to the target table.
>
> We certainly need a TODO item for "improve RI checks during bulk
> operations".

Agreed.

----------------
Hannu


From: Decibel! <decibel(at)decibel(dot)org>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #4204: COPY to table with FK has memory leak
Date: 2008-06-03 20:43:14
Message-ID: 64F489EE-978A-4766-BE42-1D02AA4CEC1F@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On May 28, 2008, at 1:22 PM, Gregory Stark wrote:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> "Tomasz Rybak" <bogomips(at)post(dot)pl> writes:
>>> I tried to use COPY to import 27M rows to table:
>>> CREATE TABLE sputnik.ccc24 (
>>> station CHARACTER(4) NOT NULL REFERENCES
>>> sputnik.station24 (id),
>>> moment INTEGER NOT NULL,
>>> flags INTEGER NOT NULL
>>> ) INHERITS (sputnik.sputnik);
>>> COPY sputnik.ccc24(id, moment, station, strength, sequence, flags)
>>> FROM '/tmp/24c3' WITH DELIMITER AS ' ';
>>
>> This is expected to take lots of memory because each row-requiring-
>> check
>> generates an entry in the pending trigger event list. Even if you
>> had
>> not exhausted memory, the actual execution of the retail checks would
>> have taken an unreasonable amount of time. The recommended way to do
>> this sort of thing is to add the REFERENCES constraint *after* you
>> load
>> all the data; that'll be a lot faster in most cases because the
>> checks
>> are done "in bulk" using a JOIN rather than one-at-a-time.
>
> Hm, it occurs to me that we could still do a join against the
> pending event
> trigger list... I wonder how feasible it would be to store the
> pending trigger
> event list in a temporary table instead of in ram.

Related to that, I really wish that our statement-level triggers
provided NEW and OLD recordsets like some other databases do. That
would allow for RI triggers to be done on a per-statement basis, and
they could aggregate keys to be checked.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828