Lists: | pgadmin-hackerspgsql-hackers |
---|
From: | Thom Brown <thombrown(at)gmail(dot)com> |
---|---|
To: | pgadmin-hackers <pgadmin-hackers(at)postgresql(dot)org> |
Subject: | Feature request: limited deletions |
Date: | 2010-04-08 10:06:18 |
Message-ID: | g2sbddc86151004080306g24e0407dxd55dc9a7c89cd575@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgadmin-hackers pgsql-hackers |
I couldn't find any discussion on this, but the request is quite
straightforward. Implement a LIMIT on DELETE statements like SELECT
statements.
So you could write:
DELETE FROM massive_table WHERE id < 40000000 LIMIT 10000;
This would allow deletions in smaller batches rather than waiting
potentially hours for the server to mark all those rows as deleted and
commit it as one massive transaction.
... although I suspect there will be reasons against this.
Thom
From: | Ian Barwick <barwick(at)gmail(dot)com> |
---|---|
To: | Thom Brown <thombrown(at)gmail(dot)com> |
Cc: | pgadmin-hackers <pgadmin-hackers(at)postgresql(dot)org> |
Subject: | Re: Feature request: limited deletions |
Date: | 2010-04-08 10:55:30 |
Message-ID: | o2h1d581afe1004080355i58f97865v5cf3c46d9b133208@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgadmin-hackers pgsql-hackers |
2010/4/8 Thom Brown <thombrown(at)gmail(dot)com>:
> I couldn't find any discussion on this, but the request is quite
> straightforward. Implement a LIMIT on DELETE statements like SELECT
> statements.
>
> So you could write:
>
> DELETE FROM massive_table WHERE id < 40000000 LIMIT 10000;
>
> This would allow deletions in smaller batches rather than waiting
> potentially hours for the server to mark all those rows as deleted and
> commit it as one massive transaction.
Is this a PgAdmin-specific question? If it is, apologies I am missing
the context.
If not, this is totally the wrong list, but why not use a subquery to
control what is deleted?
Ian Barwick
From: | Thom Brown <thombrown(at)gmail(dot)com> |
---|---|
To: | Ian Barwick <barwick(at)gmail(dot)com> |
Cc: | pgadmin-hackers <pgadmin-hackers(at)postgresql(dot)org> |
Subject: | Re: Feature request: limited deletions |
Date: | 2010-04-08 11:05:11 |
Message-ID: | h2wbddc86151004080405iaf2652c7uc561c89336e2ede2@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgadmin-hackers pgsql-hackers |
On 8 April 2010 11:55, Ian Barwick <barwick(at)gmail(dot)com> wrote:
> 2010/4/8 Thom Brown <thombrown(at)gmail(dot)com>:
> > I couldn't find any discussion on this, but the request is quite
> > straightforward. Implement a LIMIT on DELETE statements like SELECT
> > statements.
> >
> > So you could write:
> >
> > DELETE FROM massive_table WHERE id < 40000000 LIMIT 10000;
> >
> > This would allow deletions in smaller batches rather than waiting
> > potentially hours for the server to mark all those rows as deleted and
> > commit it as one massive transaction.
>
> Is this a PgAdmin-specific question? If it is, apologies I am missing
> the context.
>
> If not, this is totally the wrong list, but why not use a subquery to
> control what is deleted?
>
>
> Ian Barwick
>
Erm... my mistake, I thought this was on the generic hackers list. Moving
it over in this reply.
Thom
From: | Thom Brown <thombrown(at)gmail(dot)com> |
---|---|
To: | Ian Barwick <barwick(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [pgadmin-hackers] Feature request: limited deletions |
Date: | 2010-04-08 11:05:51 |
Message-ID: | i2ibddc86151004080405vbdbe49c1t4dd9b5cf55ed5bc4@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgadmin-hackers pgsql-hackers |
On 8 April 2010 11:55, Ian Barwick <barwick(at)gmail(dot)com> wrote:
> 2010/4/8 Thom Brown <thombrown(at)gmail(dot)com>:
> > I couldn't find any discussion on this, but the request is quite
> > straightforward. Implement a LIMIT on DELETE statements like SELECT
> > statements.
> >
> > So you could write:
> >
> > DELETE FROM massive_table WHERE id < 40000000 LIMIT 10000;
> >
> > This would allow deletions in smaller batches rather than waiting
> > potentially hours for the server to mark all those rows as deleted and
> > commit it as one massive transaction.
>
> Is this a PgAdmin-specific question? If it is, apologies I am missing
> the context.
>
> If not, this is totally the wrong list, but why not use a subquery to
> control what is deleted?
>
>
> Ian Barwick
>
Erm... my mistake, I thought this was on the generic hackers list. Moving
it over in this reply.
Thom
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Thom Brown <thombrown(at)gmail(dot)com> |
Cc: | Ian Barwick <barwick(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [pgadmin-hackers] Feature request: limited deletions |
Date: | 2010-04-08 11:45:02 |
Message-ID: | o2i603c8f071004080445w8971602br895805b5cbfbb44f@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgadmin-hackers pgsql-hackers |
On Thu, Apr 8, 2010 at 7:05 AM, Thom Brown <thombrown(at)gmail(dot)com> wrote:
> On 8 April 2010 11:55, Ian Barwick <barwick(at)gmail(dot)com> wrote:
>>
>> 2010/4/8 Thom Brown <thombrown(at)gmail(dot)com>:
>> > I couldn't find any discussion on this, but the request is quite
>> > straightforward. Implement a LIMIT on DELETE statements like SELECT
>> > statements.
>> >
>> > So you could write:
>> >
>> > DELETE FROM massive_table WHERE id < 40000000 LIMIT 10000;
>> >
>> > This would allow deletions in smaller batches rather than waiting
>> > potentially hours for the server to mark all those rows as deleted and
>> > commit it as one massive transaction.
>>
>> Is this a PgAdmin-specific question? If it is, apologies I am missing
>> the context.
>>
>> If not, this is totally the wrong list, but why not use a subquery to
>> control what is deleted?
>
> Erm... my mistake, I thought this was on the generic hackers list. Moving
> it over in this reply.
I've certainly worked around the lack of this syntax more than once.
And I bet it's not even that hard to implement.
...Robert
From: | Csaba Nagy <ncslists(at)googlemail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Thom Brown <thombrown(at)gmail(dot)com>, Ian Barwick <barwick(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [pgadmin-hackers] Feature request: limited deletions |
Date: | 2010-04-08 12:24:35 |
Message-ID: | 1270729475.1157.115.camel@pcd12478 |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgadmin-hackers pgsql-hackers |
Hi all,
On Thu, 2010-04-08 at 07:45 -0400, Robert Haas wrote:
> >> 2010/4/8 Thom Brown <thombrown(at)gmail(dot)com>:
> >> > So you could write:
> >> >
> >> > DELETE FROM massive_table WHERE id < 40000000 LIMIT 10000;
> I've certainly worked around the lack of this syntax more than once.
> And I bet it's not even that hard to implement.
The fact that it's not implemented has nothing to do with it's
complexity (in fact it is probably just a matter of enabling it) -
you'll have a hard time to convince some old-time hackers on this list
that the non-determinism inherent in this kind of query is
acceptable ;-)
There is a workaround to do it, which works quite good in fact:
delete from massive_table where ctid = any(array(select ctid from
massive_table WHERE id < 40000000 LIMIT 10000));
Just run an explain on it and you'll see it won't get any better, but
beware that it might be less optimal than you think, as you will be
likely sequential scanning the table for each chunk unless you put some
selective where conditions on it too - and then you'll still scan the
whole deleted part and not just the next chunk - the deleted records
won't go out of the way magically, you need to vacuum, and that's
probably a problem too on a big table. So most likely it will help you
less than you think on a massive table, the run time per chunk will
increase with each chunk unless you're able to vacuum efficiently. In
any case you need to balance the chunk size with the scanned portion of
the table so you get a reasonable run time per chunk, and not too much
overhead of the whole chunking process...
Cheers,
Csaba.