Re: pg_autovacuum: short, wide tables

Lists: pgsql-bugs
From: mark reid <mail(at)markreid(dot)org>
To: pgsql-bugs(at)postgresql(dot)org
Subject: pg_autovacuum: short, wide tables
Date: 2005-07-07 17:33:11
Message-ID: 42CD6757.6050004@markreid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi,

I've been using pg_autovacuum for a while, and for the most part it's
been great. There's one case in my system where it won't run on a
particular type of table, even though the table apparently needs it.

I have a table called "properties" that has key->value pairs. Usually
there are only a handful of rows, some of which are updated relatively
frequently compared to the number of rows (hundreds or thousands of
times per day). The problem is that some of the rows have long strings
for their value (on the order of a few hundred kilobytes), so if I
understand correctly, the bulk of the data gets offloaded to a toast
table.

What I believe is happening is that the main table doesn't meet the
minimum activity level for pg_autovacuum based on size / update
frequency, but the toast table would, though it isn't specifically
checked by pg_autovacuum. The result is that the toast table grows
really big before triggering autovacuum (or until I manually vacuum the
"properties" table). Not the end of the world, obviously, but might be
a "gotcha" for some people with similar situations.

Below is a snippet of output from a run of vacuumdb --full --analyze
--verbose that should illustrate the problem.

-Mark.

Table Def:
Table "schema_name.properties"
Column | Type | Modifiers
--------+-------------------+-----------
name | character varying |
value | character varying |
Indexes:
"properties_name_key" unique, btree (name)

Vacuum verbose output:

INFO: vacuuming "schema_name.properties"
INFO: "properties": found 1361 removable, 8 nonremovable row versions
in 172 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 63 to 1705 bytes long.
There were 4827 unused item pointers.
Total free space (including removable row versions) is 1376288 bytes.
164 pages are or will become empty, including 0 at the end of the table.
172 pages containing 1376288 free bytes are potential move destinations.
CPU 0.00s/0.00u sec elapsed 0.06 sec.
INFO: index "properties_name_key" now contains 8 row versions in 15 pages
DETAIL: 1361 index row versions were removed.
8 index pages have been deleted, 8 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO: "properties": moved 8 row versions, truncated 172 to 1 pages
DETAIL: CPU 0.00s/0.00u sec elapsed 0.02 sec.
INFO: index "properties_name_key" now contains 8 row versions in 15 pages
DETAIL: 8 index row versions were removed.
8 index pages have been deleted, 8 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: vacuuming "pg_toast.pg_toast_10043014"
INFO: "pg_toast_10043014": found 21052 removable, 24 nonremovable row
versions in 21100 pages
DETAIL: 0 dead row versions cannot be removed yet.
Nonremovable row versions range from 1117 to 2030 bytes long.
There were 63278 unused item pointers.
Total free space (including removable row versions) is 172044376 bytes.
21093 pages are or will become empty, including 0 at the end of the table.
21096 pages containing 172044264 free bytes are potential move destinations.
CPU 0.41s/0.06u sec elapsed 3.63 sec.
INFO: index "pg_toast_10043014_index" now contains 24 row versions in
321 pages
DETAIL: 21052 index row versions were removed.
317 index pages have been deleted, 317 are currently reusable.
CPU 0.00s/0.01u sec elapsed 0.40 sec.
INFO: "pg_toast_10043014": moved 24 row versions, truncated 21100 to 6
pages
DETAIL: CPU 0.32s/1.04u sec elapsed 5.27 sec.
INFO: index "pg_toast_10043014_index" now contains 24 row versions in
321 pages
DETAIL: 24 index row versions were removed.
317 index pages have been deleted, 317 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO: analyzing "schema_name.properties"
INFO: "properties": 1 pages, 8 rows sampled, 8 estimated total rows


From: "Matthew T(dot) O'Connor" <matthew(at)tocr(dot)com>
To: mark reid <mail(at)markreid(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-07 20:23:04
Message-ID: 42CD8F28.7050600@tocr.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

mark reid wrote:

> I've been using pg_autovacuum for a while, and for the most part it's
> been great. There's one case in my system where it won't run on a
> particular type of table, even though the table apparently needs it.
>
> I have a table called "properties" that has key->value pairs. Usually
> there are only a handful of rows, some of which are updated relatively
> frequently compared to the number of rows (hundreds or thousands of
> times per day). The problem is that some of the rows have long
> strings for their value (on the order of a few hundred kilobytes), so
> if I understand correctly, the bulk of the data gets offloaded to a
> toast table.
> What I believe is happening is that the main table doesn't meet the
> minimum activity level for pg_autovacuum based on size / update
> frequency, but the toast table would, though it isn't specifically
> checked by pg_autovacuum. The result is that the toast table grows
> really big before triggering autovacuum (or until I manually vacuum
> the "properties" table). Not the end of the world, obviously, but
> might be a "gotcha" for some people with similar situations.

I don't think the problem has to do with toast, or pg_autovacuum missing
the fact that the toast table has been updated. Rather I think the
problem is that autovacuum believes that all updates are created equal.
That is 1 update is 1 update even though a single update may effect 1
page or thousands of pages. This is where FSM data needs to be
integrated into pg_autovacuum. This isn't going to happen soon (not for
8.1) but it is definatley planned.

With the inclusion of autovacuum into the backend for 8.1, you will at
least be able to set per table thresholds and set a more aggressive
threshold for this table.

Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Matthew T(dot) O'Connor" <matthew(at)tocr(dot)com>
Cc: mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 15:23:11
Message-ID: 22767.1120836191@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Matthew T. O'Connor" <matthew(at)tocr(dot)com> writes:
> mark reid wrote:
>> What I believe is happening is that the main table doesn't meet the
>> minimum activity level for pg_autovacuum based on size / update
>> frequency, but the toast table would, though it isn't specifically
>> checked by pg_autovacuum.

> I don't think the problem has to do with toast, or pg_autovacuum missing
> the fact that the toast table has been updated. Rather I think the
> problem is that autovacuum believes that all updates are created equal.

I think Mark is probably on to something. The activity in the toast
table will show as deletes *in the toast table* ... and that activity
fails to show at all in the pg_stat_activity view, because it shows
only plain relations! So unless autovacuum is ignoring the stats views
and going directly to the underlying stats functions, it cannot see
at all that there is excessive activity in the toast table.

It strikes me that this is a definitional bug in the stats views.
Either we should change the filter to be "regular and toast tables",
or we should add columns to show activity in a table's toast table,
or we should just add the activity in the toast table to the parent
table's activity columns.

The first of these would be easiest but it seems quite likely to break
applications (eg, if unmodified, autovacuum would probably try to issue
vacuums against toast tables). And the last seems to be confusing.
So I think I favor adding columns.

regards, tom lane


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 16:25:40
Message-ID: 42CEA904.6060605@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>I think Mark is probably on to something. The activity in the toast
>table will show as deletes *in the toast table* ... and that activity
>fails to show at all in the pg_stat_activity view, because it shows
>only plain relations! So unless autovacuum is ignoring the stats views
>and going directly to the underlying stats functions, it cannot see
>at all that there is excessive activity in the toast table.
>
>

I think I'm missing something here. If I have a table t1 with a long
text column, and I do an update on that text column, doesn't that show
up as an update on table t1? And when there are enough upd/del
autovacuum will issue a VACUUM against t1, which will clean up the
associated toast table, right? So I think I must be missing something.
Could you please explain the problem in a little more detail.

>It strikes me that this is a definitional bug in the stats views.
>Either we should change the filter to be "regular and toast tables",
>or we should add columns to show activity in a table's toast table,
>or we should just add the activity in the toast table to the parent
>table's activity columns.
>
>

Ok, maybe I get it now, are you saying that if I do:
update t1 set "long text column" = "some very long text value"
and the update doesn't touch any non-toast columns that the stats system
will not show that update against t1? If so, that is a problem.

>The first of these would be easiest but it seems quite likely to break
>applications (eg, if unmodified, autovacuum would probably try to issue
>vacuums against toast tables). And the last seems to be confusing.
>So I think I favor adding columns.
>
>

Shouldn't the update to the toast table just be considered an update to
table t1? The fact that there is an underlying toast table is an
implementation detail that I don't think should show up in the stats system.

Matt


From: Mark Reid <mail(at)markreid(dot)org>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 16:48:22
Message-ID: 42CEAE56.3070704@markreid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Matthew T. O'Connor wrote:

> Tom Lane wrote:
>
>> I think Mark is probably on to something. The activity in the toast
>> table will show as deletes *in the toast table* ... and that activity
>> fails to show at all in the pg_stat_activity view, because it shows
>> only plain relations! So unless autovacuum is ignoring the stats views
>> and going directly to the underlying stats functions, it cannot see
>> at all that there is excessive activity in the toast table.
>>
>>
>
> I think I'm missing something here. If I have a table t1 with a long
> text column, and I do an update on that text column, doesn't that show
> up as an update on table t1? And when there are enough upd/del
> autovacuum will issue a VACUUM against t1, which will clean up the
> associated toast table, right? So I think I must be missing
> something. Could you please explain the problem in a little more detail.

I think the issue is that a single update to the main table causes a
whole bunch of updates to the toast table. So in my case (with the
vacuum output attached previously), a thousand updates to the main table
entails tens of thousands of updates to the toast table.
INFO: "properties": found 1361 removable, 8 nonremovable row versions
INFO: "pg_toast_10043014": found 21052 removable, 24 nonremovable row
versions

based on the default autovacuum thresholds, 21000 updates to a table
with 24 rows should have triggered a vacuum on the toast table, which is
why i pointed the finger that way originally.

-Mark.


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 16:55:17
Message-ID: 20050708165517.GB1915@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> Tom Lane wrote:

> >The first of these would be easiest but it seems quite likely to break
> >applications (eg, if unmodified, autovacuum would probably try to issue
> >vacuums against toast tables). And the last seems to be confusing.
> >So I think I favor adding columns.

Hmm. With integrated autovacuum, we could set something up to issue
vacuums separately to TOAST tables and the main table. It'd probably be
a tad easier if the toast stats are separate from the main table; and an
autovac of the main table not necessarily would invoke vacuuming the
toast table.

I'm not proposing it for 8.1 though ...

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Las cosas son buenas o malas segun las hace nuestra opinión" (Lisias)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Reid <mail(at)markreid(dot)org>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 16:57:28
Message-ID: 24101.1120841848@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Mark Reid <mail(at)markreid(dot)org> writes:
> I think the issue is that a single update to the main table causes a
> whole bunch of updates to the toast table. So in my case (with the
> vacuum output attached previously), a thousand updates to the main table
> entails tens of thousands of updates to the toast table.

Exactly. If autovac were looking at the properties of the toast table
it would think a vacuum pass was warranted sooner than it thinks from
just looking at the main table.

Admittedly this doesn't come into play unless you have a fairly large
number of toast chunks per main-table row, so the rows in question have
to be really wide (dozens of KB even after compression) before it gets
to be a big deal.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 17:00:32
Message-ID: 24129.1120842032@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Matthew T. O'Connor" <matthew(at)zeut(dot)net> writes:
> Shouldn't the update to the toast table just be considered an update to
> table t1? The fact that there is an underlying toast table is an
> implementation detail that I don't think should show up in the stats system.

At the level of the stats system, though, you are interested in
"implementation details". The fact that there is such a concept as an
index is an implementation detail according to the SQL standard --- but
if we hid that we wouldn't be able to show things that people want to
know.

In particular, I think people would like to be able to use the stats
views to see how much toast-related I/O is going on, and not have that
smushed together with main-table I/O.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 17:11:35
Message-ID: 24215.1120842695@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
>> Tom Lane wrote:
>>> The first of these would be easiest but it seems quite likely to break
>>> applications (eg, if unmodified, autovacuum would probably try to issue
>>> vacuums against toast tables). And the last seems to be confusing.
>>> So I think I favor adding columns.

> Hmm. With integrated autovacuum, we could set something up to issue
> vacuums separately to TOAST tables and the main table. It'd probably be
> a tad easier if the toast stats are separate from the main table; and an
> autovac of the main table not necessarily would invoke vacuuming the
> toast table.

> I'm not proposing it for 8.1 though ...

Well, why not? Arguably Mark's problem is a bug, and it's not too late
to address bugs.

I checked what actually happens if you try to vacuum a toast table:

regression=# vacuum pg_toast.pg_toast_169901;
WARNING: skipping "pg_toast_169901" --- cannot vacuum indexes, views, or special system tables
VACUUM
regression=#

So it's not as bad as I thought. Maybe just weakening the filter in the
stats tables views isn't a bad idea after all. Furthermore, we could
allow VACUUM on a toast table to go through (is there any good reason
to disallow it?) and then autovacuum would do more or less the right
things with no further changes.

I'm not sure about the idea of not vacuuming the toast table when we
decide to vacuum the main table.

regards, tom lane


From: Mark Reid <mail(at)markreid(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 17:19:49
Message-ID: 42CEB5B5.1090703@markreid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


>>I'm not proposing it for 8.1 though ...
>>
>>
>
>Well, why not? Arguably Mark's problem is a bug, and it's not too late
>to address bugs.
>
>
FWIW, I'd be satisfied with a warning in the autovacuum docs about this
corner case.

>I'm not sure about the idea of not vacuuming the toast table when we
>decide to vacuum the main table.
>
>
If you promote toast tables to autovacuum candidates that can be
vacuumed independently, I think autovacuum doesn't need to do both when
it does the main table. This would potentially improve performance by
minimizing the amount of work that needs to be done when a vacuum is
necessary. OTOH, you can't stop regular vacuum from including the toast
table otherwise way more people would start bugging you :)

-Mark.


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 17:21:21
Message-ID: 42CEB611.7020107@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>"Matthew T. O'Connor" <matthew(at)zeut(dot)net> writes:
>
>
>>Shouldn't the update to the toast table just be considered an update to
>>table t1? The fact that there is an underlying toast table is an
>>implementation detail that I don't think should show up in the stats system.
>>
>>
>
>At the level of the stats system, though, you are interested in
>"implementation details". The fact that there is such a concept as an
>index is an implementation detail according to the SQL standard --- but
>if we hid that we wouldn't be able to show things that people want to
>know.
>
>In particular, I think people would like to be able to use the stats
>views to see how much toast-related I/O is going on, and not have that
>smushed together with main-table I/O.
>

Fair enough, but how are you planning to display the data, if the stat
system just reports that there was an update to a corresponding toast
table, that still isn't going to tell us how many pages that updated
effected, and then we are back to the all updates are not created equal
problem. Currently autovac doesn't look at the block level stats, maybe
it should for this reason.


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 17:28:04
Message-ID: 42CEB7A4.8070509@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:

>>Hmm. With integrated autovacuum, we could set something up to issue
>>vacuums separately to TOAST tables and the main table. It'd probably be
>>a tad easier if the toast stats are separate from the main table; and an
>>autovac of the main table not necessarily would invoke vacuuming the
>>toast table.
>>
>>
>>I'm not proposing it for 8.1 though ...
>>
>>
>
>Well, why not? Arguably Mark's problem is a bug, and it's not too late
>to address bugs.
>
>I checked what actually happens if you try to vacuum a toast table:
>
>regression=# vacuum pg_toast.pg_toast_169901;
>WARNING: skipping "pg_toast_169901" --- cannot vacuum indexes, views, or special system tables
>VACUUM
>regression=#
>
>So it's not as bad as I thought. Maybe just weakening the filter in the
>stats tables views isn't a bad idea after all. Furthermore, we could
>allow VACUUM on a toast table to go through (is there any good reason
>to disallow it?) and then autovacuum would do more or less the right
>things with no further changes.
>
>

That would be a nice improvement. Any increase in the granularity of
vacuum can't be a bad thing.

>I'm not sure about the idea of not vacuuming the toast table when we
>decide to vacuum the main table.
>

Perhaps by default issuing a vacuum against a table will also vacuum
it's associated toast tables (can there be more than one toast table?)
but add an option to the vacuum command not to vacuum the toast table.
Or, perhaps this shouldn't be made totally public, ie. proving sql
commands for it, perhaps it should only be available internally, that is
to autovacuum and other tools that might issue a vacuum at the same low
level as autovacuum.

Matt


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 17:29:03
Message-ID: 24379.1120843743@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Matthew T. O'Connor" <matthew(at)zeut(dot)net> writes:
> Tom Lane wrote:
>> In particular, I think people would like to be able to use the stats
>> views to see how much toast-related I/O is going on, and not have that
>> smushed together with main-table I/O.

> Fair enough, but how are you planning to display the data,

At the moment I'm thinking "just like a regular table" --- see my
last message to Alvaro. The existing backend code will count each
toast-chunk insert or delete just like a normal row insert or delete,
so AFAICS this will produce sane-looking stats that autovac could
use the same way as for a plain table.

The main bit of additional logic that might be needed is an awareness
that firing a VACUUM on a main table will implicitly fire one on its
toast table, and so you'd not want to go and issue the toast table
VACUUM separately.

regards, tom lane


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 18:17:39
Message-ID: 20050708181739.GF1915@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Jul 08, 2005 at 01:29:03PM -0400, Tom Lane wrote:
> "Matthew T. O'Connor" <matthew(at)zeut(dot)net> writes:
> > Tom Lane wrote:
> >> In particular, I think people would like to be able to use the stats
> >> views to see how much toast-related I/O is going on, and not have that
> >> smushed together with main-table I/O.
>
> > Fair enough, but how are you planning to display the data,
>
> At the moment I'm thinking "just like a regular table" --- see my
> last message to Alvaro. The existing backend code will count each
> toast-chunk insert or delete just like a normal row insert or delete,
> so AFAICS this will produce sane-looking stats that autovac could
> use the same way as for a plain table.
>
> The main bit of additional logic that might be needed is an awareness
> that firing a VACUUM on a main table will implicitly fire one on its
> toast table, and so you'd not want to go and issue the toast table
> VACUUM separately.

I don't see any reason why this wouldn't work. I even think it'd be
very easy to implement (from the autovacuum POV -- not sure about the
stat system). Furthermore, the awareness you mention is also very easy
to implement: we just need to make sure the pg_autovacuum tuple for the
toast table is updated when it is vacuumed, which is just an additional
function call.

Have you looked at the autovacuum integration patch? There are some
ugly points on which I'd like your comments. For example, currently
vacuum() takes a RangeVar parameter; it'd be cleaner if it used a Relid
instead, and we provided a RangeVar wrapper. Also I'm not sure about
the shutdown sequence, mainly the bit about delaying the bgwriter shut
down until after autovac has already shut down.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"La conclusión que podemos sacar de esos estudios es que
no podemos sacar ninguna conclusión de ellos" (Tanenbaum)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 18:35:14
Message-ID: 26526.1120847714@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> On Fri, Jul 08, 2005 at 01:29:03PM -0400, Tom Lane wrote:
>> The main bit of additional logic that might be needed is an awareness
>> that firing a VACUUM on a main table will implicitly fire one on its
>> toast table, and so you'd not want to go and issue the toast table
>> VACUUM separately.

> I don't see any reason why this wouldn't work. I even think it'd be
> very easy to implement (from the autovacuum POV -- not sure about the
> stat system). Furthermore, the awareness you mention is also very easy
> to implement: we just need to make sure the pg_autovacuum tuple for the
> toast table is updated when it is vacuumed, which is just an additional
> function call.

I'm having some second thoughts about allowing VACUUM on a toast table
independently of its parent table --- it's a bit scary to be messing
with the toast table when we have no lock at all on the parent. It
might work OK, but I'm not sure I want to take the risk. If we simply
expose toast tables in the stats views, what has to be done to
autovacuum to get it to work properly? ("Properly" in this case would
mean "fire a VACUUM on the parent table if either it or its toast table
look like they need vacuumed".) Is this much worse than what you
say above?

> Have you looked at the autovacuum integration patch?

No, but will do.

regards, tom lane


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 18:54:57
Message-ID: 42CECC01.10308@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>I'm having some second thoughts about allowing VACUUM on a toast table
>independently of its parent table --- it's a bit scary to be messing
>with the toast table when we have no lock at all on the parent. It
>might work OK, but I'm not sure I want to take the risk. If we simply
>expose toast tables in the stats views, what has to be done to
>autovacuum to get it to work properly? ("Properly" in this case would
>mean "fire a VACUUM on the parent table if either it or its toast table
>look like they need vacuumed".) Is this much worse than what you
>say above?
>
>

Oh well.... It sounded like a good idea :-)

No it's not much worse, in fact handling this is much like the way we
handle vacuum analyze and analyze only separately. So the
infrastructure is mostly there, it will require some tweaks to the
patch, but nothing large.

Matt


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 19:38:23
Message-ID: 20050708193823.GG1915@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Fri, Jul 08, 2005 at 02:35:14PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> > On Fri, Jul 08, 2005 at 01:29:03PM -0400, Tom Lane wrote:
> >> The main bit of additional logic that might be needed is an awareness
> >> that firing a VACUUM on a main table will implicitly fire one on its
> >> toast table, and so you'd not want to go and issue the toast table
> >> VACUUM separately.
>
> > I don't see any reason why this wouldn't work. I even think it'd be
> > very easy to implement (from the autovacuum POV -- not sure about the
> > stat system). Furthermore, the awareness you mention is also very easy
> > to implement: we just need to make sure the pg_autovacuum tuple for the
> > toast table is updated when it is vacuumed, which is just an additional
> > function call.
>
> I'm having some second thoughts about allowing VACUUM on a toast table
> independently of its parent table --- it's a bit scary to be messing
> with the toast table when we have no lock at all on the parent. It
> might work OK, but I'm not sure I want to take the risk. If we simply
> expose toast tables in the stats views, what has to be done to
> autovacuum to get it to work properly? ("Properly" in this case would
> mean "fire a VACUUM on the parent table if either it or its toast table
> look like they need vacuumed".) Is this much worse than what you
> say above?

No, just a little bit more logic, I think. It requires storing info
about toast tables in pg_autovacuum, but I don't see a problem with
that. Currently there's a heap scan of pg_autovacuum, on which we would
need to skip toast tables and consider them only in conjunction with
their respective main table.

Another issue altogether is the stat system. I don't know if it stores
info about toast tables. Note that it doesn't matter that it doesn't
show up in the pg_stat views, because we don't use those; rather we
access the hash tables directly.

> > Have you looked at the autovacuum integration patch?
>
> No, but will do.

Cool.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 20:02:36
Message-ID: 27909.1120852956@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> Another issue altogether is the stat system. I don't know if it stores
> info about toast tables.

Yes, it does --- you can pull out the info, even, if you use the stats
access functions directly. It's just that the views filter on relkind =
'r'.

> Note that it doesn't matter that it doesn't show up in the pg_stat
> views, because we don't use those; rather we access the hash tables
> directly.

Ah, but this is only true in the integrated version no? The contrib
version sure looks like it's depending on the views.

regards, tom lane


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, mark reid <mail(at)markreid(dot)org>, pgsql-bugs(at)postgresql(dot)org, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-08 20:22:22
Message-ID: 42CEE07E.1000208@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:

>Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
>
>
>>Note that it doesn't matter that it doesn't show up in the pg_stat
>>views, because we don't use those; rather we access the hash tables
>>directly.
>>
>>
>
>Ah, but this is only true in the integrated version no? The contrib
>version sure looks like it's depending on the views.
>

Correct.


From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Reid <mail(at)markreid(dot)org>, "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: pg_autovacuum: short, wide tables
Date: 2005-07-11 18:09:36
Message-ID: 42D2B5E0.5070801@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 7/8/2005 12:57 PM, Tom Lane wrote:

> Mark Reid <mail(at)markreid(dot)org> writes:
>> I think the issue is that a single update to the main table causes a
>> whole bunch of updates to the toast table. So in my case (with the
>> vacuum output attached previously), a thousand updates to the main table
>> entails tens of thousands of updates to the toast table.
>
> Exactly. If autovac were looking at the properties of the toast table
> it would think a vacuum pass was warranted sooner than it thinks from
> just looking at the main table.
>
> Admittedly this doesn't come into play unless you have a fairly large
> number of toast chunks per main-table row, so the rows in question have
> to be really wide (dozens of KB even after compression) before it gets
> to be a big deal.

I think this only becomes an issue if the toasted columns not only tend
to have vastly different data sizes, but in addition that most of the
updates actually happen on wide rows. Otherwise, the percentage of dead
tuples in the main and toast table should be fairly similar.

Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#================================================== JanWieck(at)Yahoo(dot)com #