BUG #7497: Excessive toast bloat after standby promotion

From: daniel(at)heroku(dot)com
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #7497: Excessive toast bloat after standby promotion
Date: 2012-08-15 20:00:25
Message-ID: E1T1jlR-0006Lr-Op@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 7497
Logged by: Daniel Farina
Email address: daniel(at)heroku(dot)com
PostgreSQL version: 9.0.7
Operating system: Ubuntu 10.04
Description:

We have an interesting case of an application that is non-trivial but not
incredibly busy that has bloated up to over 800GB. Our charts indicate that
this bloat starts right at the time of a promotion, even though there is an
absence of code changes for quite some time both before and after. This
application had been running for many months in a constant amount of space
-- roughly 300GB -- prior to this.

There is only about 10GB of live data in this toast relation, and
sum(octet_length()) returns quite rapidly to confirm it. Although that
meant there was a lot of bloat to begin with (there was probably a large
delete at some point), predictably the usage of disk was completely stable
for a very long time as heap was re-used, and there were no problems.

We have tried adjusting the autovacuum cost parameters to be very aggressive
(10000 cost-limit) and also ran a manual vacuum verbose for some time and
got some output, even though we had to cancel it after a a few hours because
it was causing some problems. Yet, it does claim to be marking roughly 20GB
at a time as free. In spite of that, there is no apparent inflection at all
to the rate of bloat accumulation, which seems to be roughly the rate of
information churn.

Our main workload is a series of:

UPDATE tbl SET str = str || 'more string literal';

Periodically, this field is removed via nullification:

UPDATE tbl SET str = NULL;

The vacuum trace is as follows.

INFO: vacuuming "pg_toast.pg_toast_16394"
INFO: scanned index "pg_toast_16394_index" to remove 11184521 row versions
DETAIL: CPU 8.42s/29.90u sec elapsed 318.26 sec.
INFO: "pg_toast_16394": removed 11184521 row versions in 2494386 pages
DETAIL: CPU 40.10s/18.78u sec elapsed 1334.48 sec.
INFO: scanned index "pg_toast_16394_index" to remove 11184522 row versions
DETAIL: CPU 7.34s/30.13u sec elapsed 263.07 sec.
INFO: "pg_toast_16394": removed 11184522 row versions in 2476585 pages
DETAIL: CPU 33.62s/18.83u sec elapsed 661.06 sec.
INFO: scanned index "pg_toast_16394_index" to remove 11184521 row versions
DETAIL: CPU 7.29s/27.62u sec elapsed 235.69 sec.
INFO: "pg_toast_16394": removed 11184521 row versions in 2461097 pages
DETAIL: CPU 34.35s/18.74u sec elapsed 669.34 sec.
INFO: scanned index "pg_toast_16394_index" to remove 11184522 row versions
DETAIL: CPU 7.25s/26.05u sec elapsed 233.71 sec.
INFO: "pg_toast_16394": removed 11184522 row versions in 2473206 pages
DETAIL: CPU 35.11s/19.04u sec elapsed 652.17 sec.
INFO: scanned index "pg_toast_16394_index" to remove 11184521 row versions
DETAIL: CPU 7.55s/23.78u sec elapsed 238.95 sec.
INFO: "pg_toast_16394": removed 11184521 row versions in 2470127 pages
DETAIL: CPU 35.33s/19.17u sec elapsed 657.84 sec.
INFO: scanned index "pg_toast_16394_index" to remove 11184523 row versions
DETAIL: CPU 7.53s/22.20u sec elapsed 230.27 sec.
INFO: "pg_toast_16394": removed 11184523 row versions in 2565300 pages
DETAIL: CPU 36.63s/19.53u sec elapsed 696.75 sec.
INFO: scanned index "pg_toast_16394_index" to remove 11184523 row versions
DETAIL: CPU 8.35s/19.89u sec elapsed 243.68 sec.
INFO: "pg_toast_16394": removed 11184523 row versions in 2529355 pages
DETAIL: CPU 35.89s/19.67u sec elapsed 640.45 sec.
INFO: scanned index "pg_toast_16394_index" to remove 11184523 row versions
DETAIL: CPU 8.25s/18.06u sec elapsed 270.73 sec.
INFO: "pg_toast_16394": removed 11184523 row versions in 2493637 pages
DETAIL: CPU 37.06s/20.70u sec elapsed 699.94 sec.
INFO: scanned index "pg_toast_16394_index" to remove 11184523 row versions
DETAIL: CPU 9.12s/16.02u sec elapsed 264.31 sec.
INFO: "pg_toast_16394": removed 11184523 row versions in 2489705 pages
DETAIL: CPU 35.51s/21.36u sec elapsed 655.56 sec.
^CCancel request sent
ERROR: canceling statement due to user request

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2012-08-15 20:50:08 Re: BUG #6178: date_trunc : interval units "week" not supported contradicts documentation
Previous Message Kevin Grittner 2012-08-15 19:38:54 Re: BUG #7495: chosen wrong index