Re: issue log message to suggest VACUUM FULL if a table is nearly empty

Lists: pgsql-hackers
From: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-10 00:28:30
Message-ID: F40B0968DB0A904DA78A924E633BE78645FAAF@SYDEXCHTMP2.au.fjanz.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

Enclosed is the patch to implement the requirement that issue log
message to suggest VACUUM FULL if a table is nearly empty.

The requirement comes from the Postgresql TODO list.

[Benefit]

To find which table is nearly empty and suggest using 'VACUUM FULL' to
release the unused disk space this table occupied.

[Analysis]

A table is nearly empty include two scenario:

1. The table occupy small disk size and contains few unused rows.

2. The table occupy large disk size and contains large numbers of unused
rows.

Obviously the requirement is used to release the disk in the scenario2.

[Solution details]

A check function is added in the function 'lazy_vacuum_rel' to check if
the table is large enough and contains large numbers of unused rows. If
it is then issue a log message that suggesting using 'VACUUM FULL' on
the table.

The judgement policy is as following:

If the relpage of the table > RELPAGES_VALUES_THRESHOLD(default 1000)
then the table is considered to be large enough.

If the free_space/total_space > FREESPACE_PERCENTAGE_THRESHOLD(default
0.5) then the table is considered to have large numbers of unused rows.

The free_space is calculated by reading the details from the FSM pages.
This may increase the IO, but expecting very less FSM pages thus it
shouldn't cause

Any problems. Please let me know your suggestions.

[When the log message prints]

When executing SQL command 'VACUUM' or 'VACUUM on a table', this
function will be invoked and may issue the log message if the table
reach the condition.

When auto vacuum work and execute 'VACUUM on a table', this function
will be invoked and may issue the log message if the table reach the
condition.

[Example]

SELECT count(*) from t5;

count

-------

3000

(1 row)

DELETE FROM t5 where f1<2900;

DELETE 2899

SELECT count(*) from t5;

count

-------

101

(1 row)

LOG: automatic vacuum of table "wjdb.public.t5": index scans: 0

pages: 0 removed, 20
remain

tuples: 2899 removed,
101 remain, 0 are dead but not yet removable

buffer usage: 64 hits, 1
misses, 25 dirtied

avg read rate: 0.130
MB/s, avg write rate: 3.261 MB/s

system usage: CPU
0.00s/0.00u sec elapsed 0.05 sec

LOG: Table "t5" contains large numbers of unused row, suggest using
VACUUM FULL on it!

VACUUM t5;

LOG: Table "t5" contains large numbers of unused row, suggest using
VACUUM FULL on it!

Kind regards

Jing Wang

Fujitsu Australia

Attachment Content-Type Size
vacuum_v1.patch application/octet-stream 2.3 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-10 05:24:10
Message-ID: CAA4eK1JO1c+-JymrsG9JcVo3kbYyx43+FArZGDfRy+ndeuPh5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
> Enclosed is the patch to implement the requirement that issue log message to
> suggest VACUUM FULL if a table is nearly empty.
>
> The requirement comes from the Postgresql TODO list.
>
> [Solution details]
>
> A check function is added in the function 'lazy_vacuum_rel' to check if the
> table is large enough and contains large numbers of unused rows. If it is
> then issue a log message that suggesting using 'VACUUM FULL' on the table.
>
> The judgement policy is as following:
>
> If the relpage of the table > RELPAGES_VALUES_THRESHOLD(default 1000) then
> the table is considered to be large enough.
>
> If the free_space/total_space > FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
> then the table is considered to have large numbers of unused rows.
>
> The free_space is calculated by reading the details from the FSM pages. This
> may increase the IO, but expecting very less FSM pages thus it shouldn't
> cause

I think it would be better if we can use some existing stats to issue warning
message rather than traversing the FSM for all pages. For example after
vacuuming page in lazy_scan_heap(), we update the freespace for page.
You can refer below line in lazy_scan_heap().
freespace = PageGetHeapFreeSpace(page);

Now it might be possible that we might not get freespace info easily as
it is not accumulated for previous vacuum's. Incase there is no viable
way to get it through vacuum stats, we are already updating fsm after
vacuum by FreeSpaceMapVacuum(), where I think it should be possible
to get freespace.

In general, I think idea to log a message for Vaccum Full is okay, but it would
be more viable if we can do that without any additional cost.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-10 07:43:20
Message-ID: CAJrrPGdNOva7D6YOLUZAnycgdPRzbT1d7dBnYeM-gmkE=oE65g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
> > Enclosed is the patch to implement the requirement that issue log message to
> > suggest VACUUM FULL if a table is nearly empty.
> >
> > The requirement comes from the Postgresql TODO list.
> >
> > [Solution details]
> >
> > A check function is added in the function 'lazy_vacuum_rel' to check if the
> > table is large enough and contains large numbers of unused rows. If it is
> > then issue a log message that suggesting using 'VACUUM FULL' on the table.
> >
> > The judgement policy is as following:
> >
> > If the relpage of the table > RELPAGES_VALUES_THRESHOLD(default 1000) then
> > the table is considered to be large enough.
> >
> > If the free_space/total_space > FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
> > then the table is considered to have large numbers of unused rows.
> >
> > The free_space is calculated by reading the details from the FSM pages. This
> > may increase the IO, but expecting very less FSM pages thus it shouldn't
> > cause
>
> I think it would be better if we can use some existing stats to issue warning
> message rather than traversing the FSM for all pages. For example after
> vacuuming page in lazy_scan_heap(), we update the freespace for page.
> You can refer below line in lazy_scan_heap().
> freespace = PageGetHeapFreeSpace(page);
>
> Now it might be possible that we might not get freespace info easily as
> it is not accumulated for previous vacuum's. Incase there is no viable
> way to get it through vacuum stats, we are already updating fsm after
> vacuum by FreeSpaceMapVacuum(), where I think it should be possible
> to get freespace.

yes this way it works without extra penalty. But the problem is how to calculate
the free space which is left in the skipped pages because of visibility bit.

In a normal scenario, the pages which are getting skipped during vacuum process
are less in number means then this approach is a good choice.

Regards,
Hari Babu
Fujitsu Australia


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-11 03:59:40
Message-ID: CAA4eK1+Vx-1RAe9pjBVgVqBA2kp4ih2Ub1UhQv-mPiu=AHFuFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 10, 2014 at 1:13 PM, Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com> wrote:
> On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
>> > Enclosed is the patch to implement the requirement that issue log message to
>> > suggest VACUUM FULL if a table is nearly empty.
>> >
>> > The requirement comes from the Postgresql TODO list.
>> >
>> I think it would be better if we can use some existing stats to issue warning
>> message rather than traversing the FSM for all pages. For example after
>> vacuuming page in lazy_scan_heap(), we update the freespace for page.
>> You can refer below line in lazy_scan_heap().
>> freespace = PageGetHeapFreeSpace(page);
>>
>> Now it might be possible that we might not get freespace info easily as
>> it is not accumulated for previous vacuum's. Incase there is no viable
>> way to get it through vacuum stats, we are already updating fsm after
>> vacuum by FreeSpaceMapVacuum(), where I think it should be possible
>> to get freespace.
>
> yes this way it works without extra penalty. But the problem is how to calculate
> the free space which is left in the skipped pages because of visibility bit.

One way could be by extrapolating (vac_estimate_reltuples) like we do for
some other stats, but not sure if we can get the correct estimates. The
main reason is that if you observe that code path, all the decisions are
mainly done on the basis of vacrelstats. I have not checked in detail if by
using any other stats, this purpose can be achieved, may be once you can
look into it.

By the way have you checked if FreeSpaceMapVacuum() can serve your
purpose, because this call already traverses FSM in depth-first order to
update the freespace. So may be by using this call or wrapper on this
such that it returns total freespace as well apart from updating freespace
can serve the need.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-12 06:52:27
Message-ID: CAJrrPGcid6MEhB0bC=VQ1k2ruU4V2+ZFbFWO+zcMDshhnAAaOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Mon, Mar 10, 2014 at 1:13 PM, Haribabu Kommi
> <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>> On Mon, Mar 10, 2014 at 4:24 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> On Mon, Mar 10, 2014 at 5:58 AM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
>>> > Enclosed is the patch to implement the requirement that issue log message to
>>> > suggest VACUUM FULL if a table is nearly empty.
>>> >
>>> > The requirement comes from the Postgresql TODO list.
>>> >
>>> I think it would be better if we can use some existing stats to issue warning
>>> message rather than traversing the FSM for all pages. For example after
>>> vacuuming page in lazy_scan_heap(), we update the freespace for page.
>>> You can refer below line in lazy_scan_heap().
>>> freespace = PageGetHeapFreeSpace(page);
>>>
>>> Now it might be possible that we might not get freespace info easily as
>>> it is not accumulated for previous vacuum's. Incase there is no viable
>>> way to get it through vacuum stats, we are already updating fsm after
>>> vacuum by FreeSpaceMapVacuum(), where I think it should be possible
>>> to get freespace.
>>
>> yes this way it works without extra penalty. But the problem is how to calculate
>> the free space which is left in the skipped pages because of visibility bit.
>
> One way could be by extrapolating (vac_estimate_reltuples) like we do for
> some other stats, but not sure if we can get the correct estimates. The
> main reason is that if you observe that code path, all the decisions are
> mainly done on the basis of vacrelstats. I have not checked in detail if by
> using any other stats, this purpose can be achieved, may be once you can
> look into it.

I checked the vac_estimate_reltuples() function, but not able to find
a proper way to identify the free space.

> By the way have you checked if FreeSpaceMapVacuum() can serve your
> purpose, because this call already traverses FSM in depth-first order to
> update the freespace. So may be by using this call or wrapper on this
> such that it returns total freespace as well apart from updating freespace
> can serve the need.

Thanks for information. we can get the table free space by writing some wrapper
or modify a little bit of FreeSpaceMapVacuum() function. This way it
will not add
any extra overhead in identifying the table is almost empty or not.

Regards,
Hari Babu
Fujitsu Australia


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-14 03:42:23
Message-ID: CAA4eK1+3jZ0+4jb2N22pqV7qBBkwnp01TVHAbLGDbiWj04r1cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 12, 2014 at 12:22 PM, Haribabu Kommi
<kommi(dot)haribabu(at)gmail(dot)com> wrote:
> On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
>> By the way have you checked if FreeSpaceMapVacuum() can serve your
>> purpose, because this call already traverses FSM in depth-first order to
>> update the freespace. So may be by using this call or wrapper on this
>> such that it returns total freespace as well apart from updating freespace
>> can serve the need.
>
> Thanks for information. we can get the table free space by writing some wrapper
> or modify a little bit of FreeSpaceMapVacuum() function.

I think it might be okay to even change this API to return the FreeSpace, as the
other place it is used is for Index Vacuum, so even if we don't have
any intention
to print such a message for index in this patch, but similar
information could be
useful there as well to suggest a user that index has lot of free space.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-19 00:55:49
Message-ID: E78797E1DE682D4C817419A2F599CCFE012BA6@SYD1214
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday, 14 March 2014 2:42 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>On Wed, Mar 12, 2014 at 12:22 PM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com> wrote:
>> On Tue, Mar 11, 2014 at 2:59 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>>> By the way have you checked if FreeSpaceMapVacuum() can serve your
>>> purpose, because this call already traverses FSM in depth-first
>>> order to update the freespace. So may be by using this call or
>>> wrapper on this such that it returns total freespace as well apart
>>> from updating freespace can serve the need.
>>
>> Thanks for information. we can get the table free space by writing
>> some wrapper or modify a little bit of FreeSpaceMapVacuum() function.

> I think it might be okay to even change this API to return the FreeSpace, as the other place it is used is for Index Vacuum, so even if we don't have any intention to print such a message for index in this patch,
> but similar information could be useful there as well to suggest a user that index has lot of free space.

Enclosed please find the new patch which get the FreeSpace for one relation from the return of FreeSpaceMapVacuum() function. This function and the fsm_vacuum_page() function have been slightly modified to get the FreeSpace and no I/O burden increasing. The little side-effect is it will calculate FreeSpace for every table even the table is very small.

Kind regards
Jing Wang
Fujitsu Australia

Attachment Content-Type Size
vacuum_v2.patch application/octet-stream 6.7 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-20 03:44:58
Message-ID: CAA4eK1KV0-PVzxravC613QYf2oO1fg3X_U3cSTNz_TBnpATfkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 19, 2014 at 6:25 AM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
> On Friday, 14 March 2014 2:42 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> I think it might be okay to even change this API to return the FreeSpace, as the other place it is used is for Index Vacuum, so even if we don't have any intention to print such a message for index in this patch,
>> but similar information could be useful there as well to suggest a user that index has lot of free space.
>
> Enclosed please find the new patch which get the FreeSpace for one relation from the return of FreeSpaceMapVacuum() function. This function and the fsm_vacuum_page() function have been slightly modified to get the FreeSpace and no I/O burden increasing. The little side-effect is it will calculate FreeSpace for every table even the table is very small.

I think that can also be avoided, because by the time you call
FreeSpaceMapVacuum(), you already have the required information
based on which you can decide not to ask for freespace if
required.

Can't we avoid the new calculation you have added in
fsm_vacuum_page(), as this function already updates the size,
so might be we can get it from current calculation done in
function.

+ #define RELPAGES_VALUES_THRESHOLD 1000
+ #define FREESPACE_PERCENTAGE_THRESHOLD 0.5
Is there any basis to define above hash defines, we already
have one number similar to above for deciding Truncate of relation.

In anycase, I think the patch's direction is better than previous and
can be further discussed/reviewed during next CF, as it's already
quite late for 9.4.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-24 05:45:06
Message-ID: E78797E1DE682D4C817419A2F599CCFE0135B3@SYD1214
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thursday, 20 March 2014 2:45 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>On Wed, Mar 19, 2014 at 6:25 AM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
>> On Friday, 14 March 2014 2:42 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>> I think it might be okay to even change this API to return the
>>> FreeSpace, as the other place it is used is for Index Vacuum, so even if we don't have any intention to print such a message for index in this patch, but similar information could be useful there as well to suggest a user that index has lot of free space.
>>
>> Enclosed please find the new patch which get the FreeSpace for one relation from the return of FreeSpaceMapVacuum() function. This function and the fsm_vacuum_page() function have been slightly modified to get the FreeSpace and no I/O burden increasing. The little side-effect is it will calculate FreeSpace for every table even the table is very small.

>I think that can also be avoided, because by the time you call FreeSpaceMapVacuum(), you already have the required information based on which you can decide not to ask for freespace if required.

That will make the function FreeSpaceMapVacuum() look strange and be difficult to understand, so I think keeping the existing patch is better. Cause the number of pages of FSM file is small , calculating FreeSpace for small table will not bring the burden in performance.

>Can't we avoid the new calculation you have added in fsm_vacuum_page(), as this function already updates the size, so might be we can get it from current calculation done in function.

Sorry, I can't find that information from the current calculation. Could you give me some more detail information?

Kind regards
Jing Wang
Fujitsu Australia


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-26 06:02:41
Message-ID: CA+TgmoasSohUb5NRCe8ttSPZdbnwORK3Y5CHc6aFvzExs4wKSg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 9, 2014 at 5:28 PM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
> Enclosed is the patch to implement the requirement that issue log message to
> suggest VACUUM FULL if a table is nearly empty.
>
> The requirement comes from the Postgresql TODO list.
>
> If the relpage of the table > RELPAGES_VALUES_THRESHOLD(default 1000) then
> the table is considered to be large enough.
>
> If the free_space/total_space > FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
> then the table is considered to have large numbers of unused rows.

I'm not sure that we want people to automatically VF a table just
because it's 2x bloated. Doesn't it depend on the table size? And in
sort of a funny way, too, like, if the tables is small, 2x bloat is
not wasting much disk space, but getting rid of it is probably easy,
so maybe you should - but if the table is a terabyte, even 50% bloat
might be pretty intolerable, but whether it makes sense to try to get
rid of it depends on your access pattern. I'm not really too sure
whether it makes sense to try to make an automated recommendation
here, or maybe only in egregious cases.

> The free_space is calculated by reading the details from the FSM pages. This
> may increase the IO, but expecting very less FSM pages thus it shouldn't
> cause

The free space map can show more or less than the real amount of free
space, can't it? I worry about making a recommendation that might
turn out to be wildly inaccurate...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-31 04:35:36
Message-ID: CAA4eK1JCzN5obtcMjJdGvwWtTkMwDAZtt6w2sZwoOdYgst6iYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 26, 2014 at 11:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Mar 9, 2014 at 5:28 PM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
>> Enclosed is the patch to implement the requirement that issue log message to
>> suggest VACUUM FULL if a table is nearly empty.
>>
>> The requirement comes from the Postgresql TODO list.
>>
>> If the relpage of the table > RELPAGES_VALUES_THRESHOLD(default 1000) then
>> the table is considered to be large enough.
>>
>> If the free_space/total_space > FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
>> then the table is considered to have large numbers of unused rows.
>
> I'm not sure that we want people to automatically VF a table just
> because it's 2x bloated. Doesn't it depend on the table size? And in
> sort of a funny way, too, like, if the tables is small, 2x bloat is
> not wasting much disk space, but getting rid of it is probably easy,
> so maybe you should - but if the table is a terabyte, even 50% bloat
> might be pretty intolerable, but whether it makes sense to try to get
> rid of it depends on your access pattern. I'm not really too sure
> whether it makes sense to try to make an automated recommendation
> here, or maybe only in egregious cases.

I think here main difficulty is to decide when it will be considered good
to display such a message. As you said, that it depends on access pattern
whether 50% bloat is tolerable or not, so one way could be to increase the
bloat limit and table size threshold to higher value (bloat - 80%,
table_size = 500M) where it would make sense to recommend VF for all cases
or another way could be to consider using some auto vacuum threshold parameter
like autovacuum_vacuum_scale_factor to calculate threshold value for issuing
this message. I think parameter like scale factor can make sense as to an extent
this parameter is an indicative of how much dead space percentage is tolerable
for user.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-03-31 18:54:10
Message-ID: CA+TgmoZ+gf4HSpBVzjTkpzQTBAs5gF-16-Qz_AZ15_cTRuB51A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 31, 2014 at 12:35 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Mar 26, 2014 at 11:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Sun, Mar 9, 2014 at 5:28 PM, Wang, Jing <jingw(at)fast(dot)au(dot)fujitsu(dot)com> wrote:
>>> Enclosed is the patch to implement the requirement that issue log message to
>>> suggest VACUUM FULL if a table is nearly empty.
>>>
>>> The requirement comes from the Postgresql TODO list.
>>>
>>> If the relpage of the table > RELPAGES_VALUES_THRESHOLD(default 1000) then
>>> the table is considered to be large enough.
>>>
>>> If the free_space/total_space > FREESPACE_PERCENTAGE_THRESHOLD(default 0.5)
>>> then the table is considered to have large numbers of unused rows.
>>
>> I'm not sure that we want people to automatically VF a table just
>> because it's 2x bloated. Doesn't it depend on the table size? And in
>> sort of a funny way, too, like, if the tables is small, 2x bloat is
>> not wasting much disk space, but getting rid of it is probably easy,
>> so maybe you should - but if the table is a terabyte, even 50% bloat
>> might be pretty intolerable, but whether it makes sense to try to get
>> rid of it depends on your access pattern. I'm not really too sure
>> whether it makes sense to try to make an automated recommendation
>> here, or maybe only in egregious cases.
>
> I think here main difficulty is to decide when it will be considered good
> to display such a message. As you said, that it depends on access pattern
> whether 50% bloat is tolerable or not, so one way could be to increase the
> bloat limit and table size threshold to higher value (bloat - 80%,
> table_size = 500M) where it would make sense to recommend VF for all cases
> or another way could be to consider using some auto vacuum threshold parameter
> like autovacuum_vacuum_scale_factor to calculate threshold value for issuing
> this message. I think parameter like scale factor can make sense as to an extent
> this parameter is an indicative of how much dead space percentage is tolerable
> for user.

I don't think there's a very direct relationship between those things.
One of the problems we repeatedly encounter is that the scale factor
only governs when the table becomes eligible to be vacuumed; once that
happens, it takes some amount of time - ideally <1 minute but more if
all workers are busy or if autovacuum_naptime has unfortunately been
increased - for the vacuum to start, and then more time after that for
the vacuum to finish. I think the latter is really the kicker. Even
if your system is relatively well-tuned, a big table takes a long time
to vacuum, and you're going to continue accumulating bloat while the
vacuum is running.

Another aspect of my ambivalence about this is that VACUUM FULL tends
to get overused as it is. If we start making automated
recommendations in that direction, it might cause people to lean that
way even further, which would not, on the whole, be a good thing. On
the other hand, if the table is 80% dead space, it's a pretty good bet
that a VACUUM FULL is needed. Even there, though, the VACUUM FULL may
be a pretty temporary fix unless the user also fixes the underlying
issue that caused the table bloat to accumulate in the first place.
Sometimes bloat is caused by a one-off issue, like one long-running
query. But sometimes it's caused by something systematic, like
setting the cost limit too low or the nap time too high. Just telling
the user to run VACUUM FULL is likely to make the user conclude that
"PostgreSQL sucks, I have to keep running VACUUM FULL all the time,
taking a full-table lock". Of course, really giving the user a useful
level of information here is probably impractical in a log message
anyway, but that doesn't mean giving them too little information to do
something useful is better.

Yet another thing that bothers me about this is that the table might
already be getting vacuumed very frequently. If you start getting
this message from autovac once per minute, you're going to think
that's pretty stupid - especially after you try VACUUM FULL and the
problem comes right back because of constant update pressure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-04-01 04:58:48
Message-ID: CAA4eK1KDxD_qNED_yCM_6KZzKgF+AdM5HenwsAskEJkbfdkYVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 1, 2014 at 12:24 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Mar 31, 2014 at 12:35 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> On Wed, Mar 26, 2014 at 11:32 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I'm not really too sure
>>> whether it makes sense to try to make an automated recommendation
>>> here, or maybe only in egregious cases.
>>
>> I think here main difficulty is to decide when it will be considered good
>> to display such a message. As you said, that it depends on access pattern
>> whether 50% bloat is tolerable or not, so one way could be to increase the
>> bloat limit and table size threshold to higher value (bloat - 80%,
>> table_size = 500M) where it would make sense to recommend VF for all cases
>> or another way could be to consider using some auto vacuum threshold parameter
>> like autovacuum_vacuum_scale_factor to calculate threshold value for issuing
>> this message. I think parameter like scale factor can make sense as to an extent
>> this parameter is an indicative of how much dead space percentage is tolerable
>> for user.
>
>
>
> Another aspect of my ambivalence about this is that VACUUM FULL tends
> to get overused as it is. If we start making automated
> recommendations in that direction, it might cause people to lean that
> way even further, which would not, on the whole, be a good thing. On
> the other hand, if the table is 80% dead space, it's a pretty good bet
> that a VACUUM FULL is needed. Even there, though, the VACUUM FULL may
> be a pretty temporary fix unless the user also fixes the underlying
> issue that caused the table bloat to accumulate in the first place.
> Sometimes bloat is caused by a one-off issue, like one long-running
> query. But sometimes it's caused by something systematic, like
> setting the cost limit too low or the nap time too high.

Right, but it can happen even if the settings for auto vacuum are done
considering the general usage but as a one of case there is sudden spike in
update in which case it might make sense to give such a message.
However if this message keep appearing in the log every now and then,
it will mean that autovacumm settings are not appropriate for the load.
I think it will be difficult to know the exact reason for dead space, do you
think it can make sense if the message indicates (as Hint) such that,
if user observes this message repeatedly the autovacuum settings are
not as per load.
Another way could be to update docs to indicate the same.

> Just telling
> the user to run VACUUM FULL is likely to make the user conclude that
> "PostgreSQL sucks, I have to keep running VACUUM FULL all the time,
> taking a full-table lock".

Agreed user can conclude such things, but even if he figures that out himself
(which is quite possible), he will reach to same conclusion unless he is aware
that the reason could be the autovacuum settings.

Another thought that occurred to me is might be giving such an information for
Index can be more useful as there are always more chances for index bloat
especially in context of below information from docs.
"B-tree index pages that have become completely empty are reclaimed for re-use.
However, there is still a possibility of inefficient use of space: if
all but a few index
keys on a page have been deleted, the page remains allocated. Therefore, a usage
pattern in which most, but not all, keys in each range are eventually
deleted will see
poor use of space. For such usage patterns, periodic reindexing is recommended."

There are certain usage pattern's like always inserting data in particular
(increasing/decreasing) order which can lead to bloat in above context.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-07-14 10:11:15
Message-ID: 20140714101115.GA21584@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

Do we have any consensus on this patch?

I took a quick look at it because no review was posted. The patch itself
does what it claims to, but from the discussion it doesn't seem like we
want the feature; or perhaps we only don't want it in its present form.

So which is more appropriate, returned with feedback or rejected?

(In the latter case, the TODO item should also be removed.)

-- Abhijit


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-07-14 11:55:34
Message-ID: 11000.1405338934@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
> Do we have any consensus on this patch?

> I took a quick look at it because no review was posted. The patch itself
> does what it claims to, but from the discussion it doesn't seem like we
> want the feature; or perhaps we only don't want it in its present form.

Re-reading the thread quickly, it seemed like there was considerable
pushback about the cost of collecting the stats, and worries about whether
it wouldn't just be log spam. But I think the opposite of the latter
concern is also valid, namely that people who could benefit from the
warning are not going to see it because they don't peruse the postmaster
log carefully/at all. That's a generic problem for warning messages
emitted by background tasks, which we ought to think about how to fix.
In the meantime though it seems like this patch is far more likely to be
annoying than helpful.

> So which is more appropriate, returned with feedback or rejected?
> (In the latter case, the TODO item should also be removed.)

I'd vote for rejecting and annotating the TODO item with a link to
this thread. And maybe taking off the "easy" notation. I think the
TODO item is reflecting a real usability issue, but solving it usefully
is quite a bit harder than it looks.

regards, tom lane


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "Wang, Jing" <jingw(at)fast(dot)au(dot)fujitsu(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: issue log message to suggest VACUUM FULL if a table is nearly empty
Date: 2014-07-14 12:09:56
Message-ID: 20140714120956.GG325@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-07-14 07:55:34 -0400, tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
>
> I'd vote for rejecting and annotating the TODO item with a link to
> this thread.

I've marked the patch as rejected and edited the TODO list to remove the
"easy". There was already a link to this thread there.

Thank you.

-- Abhijit