Re: Turning off HOT/Cleanup sometimes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Turning off HOT/Cleanup sometimes
Date: 2014-01-15 16:47:44
Message-ID: CA+TgmoYsn2+fverxwYB+TWWQNAnvK+=+0rxVy0OGp6XBX74xag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 14, 2014 at 4:13 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 8 January 2014 08:33, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> VACUUM cleans up blocks, which is nice because it happens offline in a
>> lazy manner.
>>
>> We also make SELECT clean up blocks as it goes. That is useful in OLTP
>> workloads, but it means that large SQL queries and pg_dump effectively
>> do much the same work as VACUUM, generating huge amounts of I/O and
>> WAL on the master, the cost and annoyance of which is experienced
>> directly by the user. That is avoided on standbys.
>>
>> Effects of that are that long running statements often run much longer
>> than we want, increasing bloat as a result. It also produces wildly
>> varying response times, depending upon extent of cleanup required.
>>
>> It is a simple task to make that behaviour optional on the master.
>>
>> I propose a USERSET parameter, prune_cost_limit (<---insert better name here)
>> which will make the behaviour optional, default -1, in normal user
>> processes. VACUUM will ignore this parameter and so its actions will
>> never be deferred.
>>
>> In detail, this parameter would disable pruning for any scan larger
>> than the cost limit. So large scans will disable the behaviour. The
>> default, -1, means never disable pruning, which is the current
>> behavour.
>>
>> We track the number of pages dirtied by the current statement. When
>> this reaches prune_cost_limit, we will apply these behaviours to all
>> shared_buffer block accesses...
>>
>> (1) avoid running heap_page_prune_opt()
>>
>> (2) avoid dirtying the buffer for hints. (This is safe because the
>> hinted changes will either be lost or will be part of the full page
>> image when we make a logged-change).
>>
>> (i.e. doesn't apply to temp tables)
>>
>> For example, if we set prune_cost_limit = 4 this behaviour allows
>> small index lookups via bitmapheapscan to continue to cleanup, while
>> larger index and seq scans will avoid cleanup.
>>
>>
>>
>> There would be a postgresql.conf parameter prune_cost_limit, as well
>> as a table level parameter that would prevent pruning except via
>> VACUUM.
>>
>> This will help in these ways
>> * Reduce write I/O from SELECTs and pg_dump - improving backups and BI queries
>> * Allow finer grained control over Hot Standby conflicts
>> * Potentially allow diagnostic inspection of older data via SeqScan
>>
>> Prototype patch shows this is possible and simple enough for 9.4.
>> Major objections? Or should I polish up and submit?
>
> Patch attached, implemented to reduce writes by SELECTs only.

I am still not sure whether we want this, but I think it's definitely
an improvement over the previous version. Assorted comments:

- Naming consistency seems to me to dictate that there should be more
similarity between the reloption name (allow_buffer_cleanup) and the
GUC (prune_page_dirty_limit).

- The documentation doesn't describe the use case where suppressing
cleanup on a per-table basis would be desirable, and I can't think of
one, either.

- There are a variety of ways to limit pruning; here, you've chosen to
limit it to a particular number of pruning operations per executor
invocation. But the flag is global, not part of the executor state,
so a query that calls a PL/pgsql function during execution will reset
the counter for the parent query also, which doesn't seem very
principled.

In a patch I posted a few years ago to set hint bits only sometimes, I
settled on an algorithm where I dirtied the first 50 pages per scan
and then skipped the next 950, or something like that. The idea was
that you wanted the pages that did get dirtied to be clustered
together to avoid random I/O; and also that you wanted table of
arbitrary size to get hinted within a certain number of scans (e.g.
20). The limiting here is much more aggressive, so on large tables it
will amount to basically no pruning at all. I dunno whether that's a
good idea or not. But if the idea of making this an integer rather
than a boolean is to allow some pruning to still happen while keeping
it checked within reasonable bounds, I'm not sure it will succeed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2014-01-15 17:06:00 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance
Previous Message Stephen Frost 2014-01-15 16:35:57 Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance