Re: DELETE vs TRUNCATE explanation

From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: DELETE vs TRUNCATE explanation
Date: 2012-07-11 22:51:40
Message-ID: CAAZKuFYsK+ZQuC=zf0xE=M-LNa02tB7RBy3CHEUyq7LWCkK6jQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Wed, Jul 11, 2012 at 7:05 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Daniel Farina <daniel(at)heroku(dot)com> writes:
>> TRUNCATE should simply be very nearly the fastest way to remove data
>> from a table while retaining its type information, and if that means
>> doing DELETE without triggers when the table is small, then it should.
>> The only person who could thwart me is someone who badly wants their
>> 128K table to be exactly 8 or 0K, which seems unlikely given the 5MB
>> of catalog anyway.
>
>> Does that sound reasonable? As in, would anyone object if TRUNCATE
>> learned this behavior?
>
> Yes, I will push back on that.
>
> (1) We don't need the extra complexity.

Well, a "need" is justified by the gains, no? It seems like this
follows from the thoughts presented afterwards, so I'll discuss those.

> (2) I don't believe that you know where the performance crossover point
> would be (according to what metric, anyway?).

Nope. I don't. But an exact crossover is a level of precision I don't
really need, because here are where things stand on a completely
unremarkable test suite on the closest project to me that meets the
"regular web-app" profile case:

With en-masse DELETE:
rake 41.89s user 3.08s system 76% cpu 58.629 total

With TRUNCATE:
rake 49.86s user 2.93s system 5% cpu 15:17.88 total

15x slower. This is a Macbook Air with full disk encryption and SSD
disk with fsync off, e.g. a very typical developer configuration.
This is a rather small schema -- probably a half a dozen tables, and
probably about a dozen indexes. This application is entirely
unremarkable in its test-database workload: it wants to load a few
records, do a few things, and then clear those handful of records.

> (3) The performance of the truncation itself should not be viewed in
> isolation; subsequent behavior also needs to be considered. An example
> of possible degradation is that index bloat would no longer be
> guaranteed to be cleaned up over a series of repeated truncations.
> (You might argue that if the table is small then the indexes couldn't
> be very bloated, but I don't think that holds up over a long series.)

I'm not entirely convinced to the mechanism, it was simply the most
obvious one, but I bet a one that is better in every respect is also
possible. It did occur to me that bloat might be a sticky point.

> IOW, I think it's fine as-is. I'd certainly wish to see many more
> than one complainant before we expend effort in this area.

I've seen way more than one complaint, and I'm quite sure there are
thousands of man hours (or more) spent on people who don't even know
to complain about such atrocious performance (or maybe it's so bad
that most people run a web search and find out, probably being left
really annoyed from having to yak shave as a result). In spite of how
familiar I am with Postgres and its mailing lists, I have glossed over
this for a long time, just thinking "wow, that really sucks" and only
now -- by serendipity of having skimmed this post -- have seen fit to
complain on behalf of quite a few rounds of dispensing workaround
advice to other people. It's only when this was brought to the fore
of my mind did I stop to consider how much wasted time I've seen in
people trying to figure this out over and over again (granted, they
tend to remember after the first time).

Perhaps a doc fix is all we need (TRUNCATE is constant-time on large
tables, but can be very slow compared to DELETE on small tables), but
I completely and enthusiastically reject any notion from people
calling this "contrived" or an "edge case," because people writing
software against PostgreSQL that have unit tests have this use case
constantly, often dozens or even hundreds of times a day.

What I don't know is how many people figure out that they should use
DELETE instead, and after how long. Even though the teams I work with
are very familiar with many of the finer points of Postgres, doing
some probing for the first time took a little while.

If we're going to live with it, I contest that we should own it as a
real and substantial weakness for development productivity, and not
sweep it under the rug as some "contrived" or "corner" case.

--
fdr

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2012-07-11 23:03:34 Re: SP-GiST for ranges based on 2d-mapping and quad-tree
Previous Message Craig James 2012-07-11 22:09:56 Re: DELETE vs TRUNCATE explanation

Browse pgsql-performance by date

  From Date Subject
Next Message codevally 2012-07-12 00:03:39 PostgreSQL index issue
Previous Message Craig James 2012-07-11 22:09:56 Re: DELETE vs TRUNCATE explanation