Re: [PERFORM] DELETE vs TRUNCATE explanation

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Harold A(dot) Giménez <harold(dot)gimenez(at)gmail(dot)com>
Subject: Re: [PERFORM] DELETE vs TRUNCATE explanation
Date: 2012-07-13 04:55:22
Message-ID: CAMkU=1w7vLA63hf-+Uc_j61CvOdKixsOHph9d0H-aLNBYgfX=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

I've moved this thread from performance to hackers.

The topic was poor performance when truncating lots of small tables
repeatedly on test environments with fsync=off.

On Thu, Jul 12, 2012 at 6:00 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> I think the problem is in the Fsync Absorption queue. Every truncate
> adds a FORGET_RELATION_FSYNC to the queue, and processing each one of
> those leads to sequential scanning the checkpointer's pending ops hash
> table, which is quite large. It is almost entirely full of other
> requests which have already been canceled, but it still has to dig
> through them all. So this is essentially an N^2 operation.

My attached Proof of Concept patch reduces the run time of the
benchmark at the end of this message from 650sec to 84sec,
demonstrating that this is in fact the problem. Which doesn't mean
that my patch is the right answer to it, of course.

(The delete option is still faster than truncate, coming in at around 55sec)

> I'm not sure why we don't just delete the entry instead of marking it
> as cancelled. It looks like the only problem is that you can't delete
> an entry other than the one just returned by hash_seq_search. Which
> would be fine, as that is the entry that we would want to delete;
> except that mdsync might have a different hash_seq_search open, and so
> it wouldn't be safe to delete.
>
> If the segno was taken out of the hash key and handled some other way,
> then the forgetting could be done with a simple hash look up rather
> than a full scan.

The above two ideas might be the better solution, as they would work
even when fsync=on. Since BBU are becoming so popular I think the
fsync queue could be a problem even with fsync on if the fsync is fast
enough. But I don't immediately know how to implement them.

> Maybe we could just turn off the pending ops table altogether when
> fsync=off, but since fsync is PGC_SIGHUP it is not clear how you could
> safely turn it back on.

Now that I think about it, I don't see how turning fsync from off to
on can ever be known to be safe, until a system wide sync has
intervened. After all a segment that was dirtied and added to the
pending ops table while fsync=off might also be removed from the
pending ops table the microsecond before fsync is turned on, so how is
that different from never adding it in the first place?

The attached Proof Of Concept patch implements this in two ways, one
of which is commented out. The commented out way omits the overhead
of sending the request to the checkpointer in the first place, but
breaks modularity a bit.

The benchmark used on 9.3devel head is:

fsync=off, all other defaults.

## one time initialization
perl -le 'print "create schema foo$_; create table foo$_.foo$_ (k
integer, v integer);" $ARGV[0]..$ARGV[0]+$ARGV[1]-1' 0 10 |psql

## actual benchmark.
perl -le 'print "set client_min_messages=warning;";
foreach (1..10000) {
print "BEGIN;\n";
print "insert into foo$_.foo$_ select * from
generate_series(1,10); " foreach $ARGV[0]..$ARGV[0]+$ARGV[1]-1;
print "COMMIT;\nBEGIN;\n";
print "truncate table foo$_.foo$_; " foreach
$ARGV[0]..$ARGV[0]+$ARGV[1]-1;
#print "delete from foo$_.foo$_; " foreach
$ARGV[0]..$ARGV[0]+$ARGV[1]-1;
print "COMMIT;\n"
} ' 0 10 | time psql > /dev/null

Cheers,

Jeff

Attachment Content-Type Size
fsync_queue_POC.patch application/octet-stream 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-07-13 04:55:37 Re: Synchronous Standalone Master Redoux
Previous Message Etsuro Fujita 2012-07-13 03:35:12 Re: pgsql_fdw in contrib

Browse pgsql-performance by date

  From Date Subject
Next Message David Kerr 2012-07-13 06:51:22 slow prepare, lots of semop calls.
Previous Message Yan Chunlu 2012-07-13 04:02:02 Re: how could select id=xx so slow?