Re: Slow Bulk Delete

Lists: pgsql-performance
From: thilo <thilo(dot)tanner(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Slow Bulk Delete
Date: 2010-05-08 11:39:58
Message-ID: f923f5db-a6b1-4029-8ebe-a6d1cc35932a@e2g2000yqn.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi all!

We moved from MySQL to Postgresql for some of our projects. So far
we're very impressed with the performance (especially INSERTs and
UPDATEs), except for a strange problem with the following bulk delete
query:

DELETE FROM table1 WHERE table2_id = ?

I went through these Wiki pages, trying to solve the problem:

http://wiki.postgresql.org/wiki/SlowQueryQuestions and
http://wiki.postgresql.org/wiki/Performance_Optimization

but unfortunately without much luck.

Our application is doing batch jobs. On every batch run, we must
delete approx. 1M rows in table1 and recreate these entries. The
inserts are very fast, but deletes are not. We cannot make updates,
because there's no identifying property in the objects of table1.

This is what EXPLAIN is telling me:

EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = 11242939
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------
Index Scan using sr_index on table1 (cost=0.00..8.56 rows=4 width=6)
(actual time=0.111..0.154 rows=4 loops=1)
Index Cond: (table2_id = 11242939)
Total runtime: 0.421 ms
(3 rows)

This seems to be very fast (using the index), but running this query
from JDBC takes up to 20ms each. For 1M rows this sum up to several
hours. When I have a look at pg_top psql uses most of the time for the
deletes. CPU usage is 100% (for the core used by postgresql). So it
seems that postgresql is doing some sequential scanning or constraint
checks.

This is the table structure:

id bigint (primary key)
table2_id bigint (foreign key constraint to table 2, *indexed*)
table3_id bigint (foreign key constraint to table 3, *indexed*)
some non-referenced text and boolean fields

My server settings (Potgresql 8.4.2):

shared_buffers = 1024MB
effective_cache_size = 2048MB
work_mem = 128MB
wal_buffers = 64MB
checkpoint_segments = 32
checkpoint_timeout = 15min
checkpoint_completion_target = 0.9

It would be very nice to give me a hint to solve the problem. It
drives me crazy ;-)

If you need more details please feel free to ask!

Thanks in advance for your help!

Kind regards

Thilo


From: Andy Colson <andy(at)squeakycode(dot)net>
To: thilo <thilo(dot)tanner(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-08 13:17:13
Message-ID: 4BE56459.4040007@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 05/08/2010 06:39 AM, thilo wrote:
> Hi all!
>
> We moved from MySQL to Postgresql for some of our projects. So far
> we're very impressed with the performance (especially INSERTs and
> UPDATEs), except for a strange problem with the following bulk delete
> query:
>
> DELETE FROM table1 WHERE table2_id = ?
>
> I went through these Wiki pages, trying to solve the problem:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions and
> http://wiki.postgresql.org/wiki/Performance_Optimization
>
> but unfortunately without much luck.
>
> Our application is doing batch jobs. On every batch run, we must
> delete approx. 1M rows in table1 and recreate these entries. The
> inserts are very fast, but deletes are not. We cannot make updates,
> because there's no identifying property in the objects of table1.
>
> This is what EXPLAIN is telling me:
>
> EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id = 11242939
> QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------
> Index Scan using sr_index on table1 (cost=0.00..8.56 rows=4 width=6)
> (actual time=0.111..0.154 rows=4 loops=1)
> Index Cond: (table2_id = 11242939)
> Total runtime: 0.421 ms
> (3 rows)
>
> This seems to be very fast (using the index), but running this query
> from JDBC takes up to 20ms each. For 1M rows this sum up to several
> hours. When I have a look at pg_top psql uses most of the time for the
> deletes. CPU usage is 100% (for the core used by postgresql). So it
> seems that postgresql is doing some sequential scanning or constraint
> checks.
>
> This is the table structure:
>
> id bigint (primary key)
> table2_id bigint (foreign key constraint to table 2, *indexed*)
> table3_id bigint (foreign key constraint to table 3, *indexed*)
> some non-referenced text and boolean fields
>
> My server settings (Potgresql 8.4.2):
>
> shared_buffers = 1024MB
> effective_cache_size = 2048MB
> work_mem = 128MB
> wal_buffers = 64MB
> checkpoint_segments = 32
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.9
>
> It would be very nice to give me a hint to solve the problem. It
> drives me crazy ;-)
>
> If you need more details please feel free to ask!
>
> Thanks in advance for your help!
>
> Kind regards
>
> Thilo

I am going to guess the slow part is sending 1M different queries back and forth from client to server. You could try batching them together:

DELETE FROM table1 WHERE table2_id in (11242939, 1,2,3,4,5...., 42);

Also are you preparing the query?

-Andy


From: thilo <thilo(dot)tanner(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-09 21:26:28
Message-ID: a79c1af7-8ef3-4d9a-9341-c9ff7299dd4a@b18g2000yqb.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Andy!

Thanks a lot for your hints!

Indeed the problem was on my side. Some Hibernate tuning solved the
problem (and I was able to speedup the query using IN). The real
problem was that Hibernate using unprepared queries if you create a
native query, but prepares the query if you use JP-QL (very odd
behavior). Thanks anyway for your help!

Kind regards

Thilo

> I am going to guess the slow part is sending 1M different queries back and forth from client to server.  You could try batching them together:
>
> DELETE FROM table1 WHERE table2_id in (11242939, 1,2,3,4,5...., 42);
>
> Also are you preparing the query?
>
> -Andy
>
> --
> Sent via pgsql-performance mailing list (pgsql-performa(dot)(dot)(dot)(at)postgresql(dot)org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance


From: Bob Lunney <bob_lunney(at)yahoo(dot)com>
To: pgsql-performance(at)postgresql(dot)org, thilo <thilo(dot)tanner(at)gmail(dot)com>
Subject: Re: Slow Bulk Delete
Date: 2010-05-13 03:13:42
Message-ID: 821768.44263.qm@web39705.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thilo,

Just a few of thoughts off the top of my head:

1. If you know the ids of the rows you want to delete beforhand, insert them in a table, then run the delete based on a join with this table.

2. Better yet, insert the ids into a table using COPY, then use a join to create a new table with the rows you want to keep from the first table. Drop the original source table, truncate the id table, rename the copied table and add indexes and constraints.

3. See if you can partition the table somehow so the rows you want to delete are in a single partitioned child table. When its time to delete them just drop the child table.

Of course, if the 1M rows you need to delete is very small compared to the total overall size of the original table the first two techniques might now buy you anything, but its worth a try.

Good luck!

Bob Lunney

--- On Sat, 5/8/10, thilo <thilo(dot)tanner(at)gmail(dot)com> wrote:

> From: thilo <thilo(dot)tanner(at)gmail(dot)com>
> Subject: [PERFORM] Slow Bulk Delete
> To: pgsql-performance(at)postgresql(dot)org
> Date: Saturday, May 8, 2010, 7:39 AM
> Hi all!
>
> We moved from MySQL to Postgresql for some of our projects.
> So far
> we're very impressed with the performance (especially
> INSERTs and
> UPDATEs), except for a strange problem with the following
> bulk delete
> query:
>
> DELETE FROM table1 WHERE table2_id = ?
>
> I went through these Wiki pages, trying to solve the
> problem:
>
> http://wiki.postgresql.org/wiki/SlowQueryQuestions and
> http://wiki.postgresql.org/wiki/Performance_Optimization
>
> but unfortunately without much luck.
>
> Our application is doing batch jobs. On every batch run, we
> must
> delete approx. 1M rows in table1 and recreate these
> entries. The
> inserts are very fast, but deletes are not. We cannot make
> updates,
> because there's no identifying property in the objects of
> table1.
>
> This is what EXPLAIN is telling me:
>
> EXPLAIN ANALYZE DELETE FROM table1 WHERE table2_id =
> 11242939
>                
>                
>                
>          QUERY
> PLAN
> ----------------------------------------------------------------------------------------------------------------------------
> Index Scan using sr_index on table1  (cost=0.00..8.56
> rows=4 width=6)
> (actual time=0.111..0.154 rows=4 loops=1)
>    Index Cond: (table2_id = 11242939)
> Total runtime: 0.421 ms
> (3 rows)
>
> This seems to be very fast (using the index), but running
> this query
> from JDBC takes up to 20ms each. For 1M rows this sum up to
> several
> hours. When I have a look at pg_top psql uses most of the
> time for the
> deletes. CPU usage is 100% (for the core used by
> postgresql). So it
> seems that postgresql is doing some sequential scanning or
> constraint
> checks.
>
> This is the table structure:
>
> id   
> bigint     (primary key)
> table2_id   
> bigint     (foreign key constraint
> to table 2, *indexed*)
> table3_id   
> bigint     (foreign key constraint
> to table 3, *indexed*)
> some non-referenced text and boolean fields
>
> My server settings (Potgresql 8.4.2):
>
> shared_buffers = 1024MB
> effective_cache_size = 2048MB
> work_mem = 128MB
> wal_buffers = 64MB
> checkpoint_segments = 32
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.9
>
> It would be very nice to give me a hint to solve the
> problem. It
> drives me crazy ;-)
>
> If you need more details please feel free to ask!
>
> Thanks in advance for your help!
>
> Kind regards
>
> Thilo
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: "Pierre C" <lists(at)peufeu(dot)com>
To: thilo <thilo(dot)tanner(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-17 10:10:31
Message-ID: op.vcug7tj0eorkce@immo.peufeu.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> DELETE FROM table1 WHERE table2_id = ?

For bulk deletes, try :

DELETE FROM table1 WHERE table2_id IN (list of a few thousands ids)

- or use a JOIN delete with a virtual VALUES table
- or fill a temp table with ids and use a JOIN DELETE

This will save cliet/server roundtrips.

Now, something that can make a DELETE very slow is a non-indexed ON DELETE
CASCADE foreign key : when you DELETE FROM table1 and it cascades to a
DELETE on table2, and you forget the index on table2. Also check the time
spent in triggers. Do you have a GIN index ?


From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-17 11:54:26
Message-ID: AANLkTinEPzUlYprV0Cm5rvFPl5AQg08483pxAgSIxAnJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists(at)peufeu(dot)com> wrote:
> - or use a JOIN delete with a virtual VALUES table
> - or fill a temp table with ids and use a JOIN DELETE

What is a virtual VALUES table? Can you give me an example of using a
virtual table with selects, joins, and also deletes?

--
Jon


From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-17 12:07:17
Message-ID: AANLkTinamss2q5gUWJEuMfzXVK3_43hyraPYkMWF4j46@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2010/5/17 Jon Nelson <jnelson+pgsql(at)jamponi(dot)net<jnelson%2Bpgsql(at)jamponi(dot)net>
>

> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists(at)peufeu(dot)com> wrote:
> > - or use a JOIN delete with a virtual VALUES table
> > - or fill a temp table with ids and use a JOIN DELETE
>
> What is a virtual VALUES table? Can you give me an example of using a
> virtual table with selects, joins, and also deletes?
>
>
>
delete from a using (values (1),(2),(5),(8)) b(x) where a.id=b.x

See http://www.postgresql.org/docs/8.4/static/sql-values.html

--
Best regards,
Vitalii Tymchyshyn


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-17 12:15:52
Message-ID: AANLkTik2VPw7vSFvNZHAu8_TAXExqiTkhH4n4ylBvzlA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, May 17, 2010 at 12:54 PM, Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> wrote:
> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists(at)peufeu(dot)com> wrote:
>> - or use a JOIN delete with a virtual VALUES table
>> - or fill a temp table with ids and use a JOIN DELETE
>
> What is a virtual VALUES table? Can you give me an example of using a
> virtual table with selects, joins, and also deletes?
>

I think he refers to the way you pass values in insert, and alike:
INSERT INTO foo(a,b) VALUES(1,2), (2,3), (3,4);


From: "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-17 12:28:16
Message-ID: 20100517122816.GA31501@a-kretschmer.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

In response to Jon Nelson :
> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists(at)peufeu(dot)com> wrote:
> > - or use a JOIN delete with a virtual VALUES table
> > - or fill a temp table with ids and use a JOIN DELETE
>
> What is a virtual VALUES table? Can you give me an example of using a
> virtual table with selects, joins, and also deletes?

Something like this:

test=# select * from foo;
c1
----
1
2
3
4
(4 rows)

test=*# delete from foo using (values (1),(2) ) as bla where foo.c1=bla.column1;
DELETE 2
test=*# select * from foo;
c1
----
3
4
(2 rows)

values (1), (2) as bla -> returns a 'virtual table' bla with one
column column1.

Andreas
--
Andreas Kretschmer
Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99


From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-17 12:28:38
Message-ID: AANLkTimtJy7hLkB5kI9VWBN2DNYg9SvmIiFUolWzV4it@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2010/5/17 Віталій Тимчишин <tivv00(at)gmail(dot)com>:
>
>
> 2010/5/17 Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
>>
>> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists(at)peufeu(dot)com> wrote:
>> > - or use a JOIN delete with a virtual VALUES table
>> > - or fill a temp table with ids and use a JOIN DELETE
>>
>> What is a virtual VALUES table? Can you give me an example of using a
>> virtual table with selects, joins, and also deletes?
>>
>>
>
> delete from a using (values (1),(2),(5),(8)) b(x) where a.id=b.x
> See http://www.postgresql.org/docs/8.4/static/sql-values.html

This syntax I'm familiar with. The author of the previous message
(Pierre C) indicated that there is a concept of a virtual table which
could be joined to. I'd like to know what this virtual table thing
is, specifically in the context of joins.

--
Jon


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-17 12:33:09
Message-ID: AANLkTimXDBifOIHsw2N2Tsq-XkY5Kohp6Fhy5TAlBjIV@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

again VALUES(1,2), (2,3), ....; is a 'virtual table', as he calls it.
It really is not a table to postgresql. I guess he is just using that
naming convention.


From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To: "A(dot) Kretschmer" <andreas(dot)kretschmer(at)schollglas(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Slow Bulk Delete
Date: 2010-05-17 12:35:35
Message-ID: AANLkTilWtM3tFBqxll5h_V0YWnB4sxgPyxb1bauwxk_2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, May 17, 2010 at 7:28 AM, A. Kretschmer
<andreas(dot)kretschmer(at)schollglas(dot)com> wrote:
> In response to Jon Nelson :
>> On Mon, May 17, 2010 at 5:10 AM, Pierre C <lists(at)peufeu(dot)com> wrote:
>> > - or use a JOIN delete with a virtual VALUES table
>> > - or fill a temp table with ids and use a JOIN DELETE
>>
>> What is a virtual VALUES table? Can you give me an example of using a
>> virtual table with selects, joins, and also deletes?
>
> Something like this:
...

delete from foo using (values (1),(2) ) as bla where foo.c1=bla.column1;
...

Aha! Cool. That's not quite what I envisioned when you said virtual
table, but it surely clarifies things.
Thanks!

--
Jon