Re: LIMIT for UPDATE and DELETE

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Rukh Meski <rukh(dot)meski(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LIMIT for UPDATE and DELETE
Date: 2014-08-25 18:25:39
Message-ID: CAMkU=1w8B+y9ap9KywxBEXpLd9vwkN40ACqNjxHgC8=60aSyTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 24, 2014 at 11:48 PM, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp
> wrote:

> Hi Rukh,
>
>
> (2014/08/15 6:18), Rukh Meski wrote:
>
>> Based on the feedback on my previous patch, I've separated only the
>> LIMIT part into its own feature. This version plays nicely with
>> inheritance. The intended use is splitting up big UPDATEs and DELETEs
>> into batches more easily and efficiently.
>>
>
> Before looking into the patch, I'd like to know the use cases in more
> details.
>

There are two common use cases I can think of:

1)

I've just added a column to an existing table, and it is all NULL. I've
changed the code to populate that column appropriately for new or updated
rows, but I need to back fill the existing rows. I have a (slow) method to
compute the new value. (I've not yet changed the code to depend on that
column being populated)

The obvious solution is:

update the_table set new_col=populate_new_col(whatever) where new_col is
null.

But this will bloat the table because vacuum cannot intervene, and will
take a very long time. The first row to be update will remain locked until
the last row gets updated, which is not acceptable. And if something goes
wrong before the commit, you've lost all the work.

With the limit clause, you can just do this:

update the_table set new_col=populate_new_col(whatever) where new_col is
null limit 50000;

In a loop with appropriate vacuuming and throttling.

2)

I've introduced or re-designed partitioning, and need to migrate rows to
the appropriate partitions without long lived row locks.

create table pgbench_accounts2 () inherits (pgbench_accounts);

and then in a loop:

with t as (delete from only pgbench_accounts where aid < 500000 limit 5000
returning *)
insert into pgbench_accounts2 select * from t;

Cheers,

Jeff

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-08-25 18:27:59 Re: Specifying the unit in storage parameter
Previous Message Fujii Masao 2014-08-25 18:22:59 Re: psql \watch versus \timing