Quick Links

Re: [PERFORM] DELETE vs TRUNCATE explanation

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc:	Daniel Farina <daniel(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Harold A(dot) Giménez <harold(dot)gimenez(at)gmail(dot)com>
Subject:	Re: [PERFORM] DELETE vs TRUNCATE explanation
Date:	2012-07-13 04:55:22
Message-ID:	CAMkU=1w7vLA63hf-+Uc_j61CvOdKixsOHph9d0H-aLNBYgfX=w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers pgsql-performance

I've moved this thread from performance to hackers.

The topic was poor performance when truncating lots of small tables
repeatedly on test environments with fsync=off.

On Thu, Jul 12, 2012 at 6:00 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> I think the problem is in the Fsync Absorption queue. Every truncate
> adds a FORGET_RELATION_FSYNC to the queue, and processing each one of
> those leads to sequential scanning the checkpointer's pending ops hash
> table, which is quite large. It is almost entirely full of other
> requests which have already been canceled, but it still has to dig
> through them all. So this is essentially an N^2 operation.

My attached Proof of Concept patch reduces the run time of the
benchmark at the end of this message from 650sec to 84sec,
demonstrating that this is in fact the problem. Which doesn't mean
that my patch is the right answer to it, of course.

(The delete option is still faster than truncate, coming in at around 55sec)

> I'm not sure why we don't just delete the entry instead of marking it
> as cancelled. It looks like the only problem is that you can't delete
> an entry other than the one just returned by hash_seq_search. Which
> would be fine, as that is the entry that we would want to delete;
> except that mdsync might have a different hash_seq_search open, and so
> it wouldn't be safe to delete.
>
> If the segno was taken out of the hash key and handled some other way,
> then the forgetting could be done with a simple hash look up rather
> than a full scan.

The above two ideas might be the better solution, as they would work
even when fsync=on. Since BBU are becoming so popular I think the
fsync queue could be a problem even with fsync on if the fsync is fast
enough. But I don't immediately know how to implement them.

> Maybe we could just turn off the pending ops table altogether when
> fsync=off, but since fsync is PGC_SIGHUP it is not clear how you could
> safely turn it back on.

Now that I think about it, I don't see how turning fsync from off to
on can ever be known to be safe, until a system wide sync has
intervened. After all a segment that was dirtied and added to the
pending ops table while fsync=off might also be removed from the
pending ops table the microsecond before fsync is turned on, so how is
that different from never adding it in the first place?

The attached Proof Of Concept patch implements this in two ways, one
of which is commented out. The commented out way omits the overhead
of sending the request to the checkpointer in the first place, but
breaks modularity a bit.

The benchmark used on 9.3devel head is:

fsync=off, all other defaults.

## one time initialization
perl -le 'print "create schema foo$_; create table foo$_.foo$_ (k
integer, v integer);" $ARGV[0]..$ARGV[0]+$ARGV[1]-1' 0 10 |psql

## actual benchmark.
perl -le 'print "set client_min_messages=warning;";
foreach (1..10000) {
print "BEGIN;\n";
print "insert into foo$_.foo$_ select * from
generate_series(1,10); " foreach $ARGV[0]..$ARGV[0]+$ARGV[1]-1;
print "COMMIT;\nBEGIN;\n";
print "truncate table foo$_.foo$_; " foreach
$ARGV[0]..$ARGV[0]+$ARGV[1]-1;
#print "delete from foo$_.foo$_; " foreach
$ARGV[0]..$ARGV[0]+$ARGV[1]-1;
print "COMMIT;\n"
} ' 0 10 | time psql > /dev/null

Cheers,

Jeff

Attachment	Content-Type	Size
fsync_queue_POC.patch	application/octet-stream	1.1 KB

In response to

Re: DELETE vs TRUNCATE explanation at 2012-07-13 01:00:49 from Jeff Janes

Responses

Re: [PERFORM] DELETE vs TRUNCATE explanation at 2012-07-15 00:10:18 from Jeff Janes

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2012-07-13 04:55:37	Re: Synchronous Standalone Master Redoux
Previous Message	Etsuro Fujita	2012-07-13 03:35:12	Re: pgsql_fdw in contrib

Browse pgsql-performance by date

	From	Date	Subject
Next Message	David Kerr	2012-07-13 06:51:22	slow prepare, lots of semop calls.
Previous Message	Yan Chunlu	2012-07-13 04:02:02	Re: how could select id=xx so slow?