Re: PATCH: optimized DROP of multiple tables within a transaction

From: 花田 茂 <shigeru(dot)hanada(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: optimized DROP of multiple tables within a transaction
Date: 2012-10-18 02:28:32
Message-ID: C140EC80-DF0D-4963-A65E-5E49E20B2781@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Tomas,

On 2012/10/17, at 20:45, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>
> Dne 17.10.2012 12:34, Shigeru HANADA napsal:
>> Performance test
>> ================
>> I tested 1000 tables case (each is copy of pgbench_branches with 100000
>> rows) on 1GB shared_buffers server. Please note that I tested on
>> MacBook air, i.e. storage is not HDD but SSD. Here is the test procedure:
>>
>> 1) loop 1000 times
>> 1-1) create copy table of pgbench_accounts as accounts$i
>> 1-2) load 100000 rows
>> 1-3) add primary key
>> 1-4) select all rows to cache pages in shared buffer
>> 2) BEGIN
>> 3) loop 1000 times
>> 3-1) DROP TABLE accounts$i
>> 4) COMMIT
>
> I don't think the 'load rows' and 'select all rows' is really necessary.
> And AFAIK sequential scans use small circular buffer not to pollute shared
> buffers, so I'd guess the pages are not cached in shared buffers anyway.
> Have you verified that, e.g. by pg_buffercache?

Oops, you're right. I omitted 1-3 and 1-4 in actual measurement, but IMO loading data is necessary to fill the shared buffer up, because # of buffers which are deleted during COMMIT is major factor of this patch. And, yes, I verified that all shared buffers are used after all loading have been finished.

>
>>> Our system creates a lot of "working tables" (even 100.000) and we need
>>> to perform garbage collection (dropping obsolete tables) regularly. This
>>> often took ~ 1 hour, because we're using big AWS instances with lots of
>>> RAM (which tends to be slower than RAM on bare hw). After applying this
>>> patch and dropping tables in groups of 100, the gc runs in less than 4
>>> minutes (i.e. a 15x speed-up).
>>
>> Hm, my environment seems very different from yours. Could you show the
>> setting of shared_buffers in your environment? I'd like to make my test
>> environment as similar as possible to yours.
>
> We're using m2.4xlarge instances (70G of RAM) with 10GB shared buffers.

Thank you, it's more huge than I expected. I'm not sure whether my boss allows me to use such rich environment... :(

Here are results of additional measurements on my MBA.

* stats of 1000 bare DROP TABLE statements

90%ile of patched PG is just 2% slower than Master, so it would be acceptable.

| Patched | Master
---------+------------+------------
Average | 1.595 ms | 1.634 ms
Median | 1.791 ms | 1.900 ms
90%ile | 2.517 ms | 2.477 ms
Max | 37.526 ms | 24.553 ms

* Total time to complete 1000 DROP TABLEs and COMMIT

| Patched | Master
-------+---------+---------
Bare | 1595 ms | 1634 ms
In TX | 672 ms | 1459 ms

Regards,
--
Shigeru HANADA
shigeru(dot)hanada(at)gmail(dot)com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2012-10-18 03:17:50 Re: Bugs in CREATE/DROP INDEX CONCURRENTLY
Previous Message Joachim Wieland 2012-10-18 01:08:58 Re: [PATCH] pg_dump: Sort overloaded functions in deterministic order