Re: [PATCHES] smartvacuum() instead of autovacuum

Lists: pgsql-hackerspgsql-patches
From: "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com>
To: <pgsql-patches(at)postgresql(dot)org>
Subject: smartvacuum() instead of autovacuum
Date: 2006-10-22 11:12:37
Message-ID: 200610221114.k9MBERQa007930@mbox31.po.2iij.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi,

I am trying to implement smartvacuum(), which do vacuum only tables having
many dead rows, instead of autovacuum.
I read sources of autovacuum and pgstat, and found it could be if
smartvacuum() gets information from pgstat like autovacuum does.

concerns:
1. autovacuum makes vacuum nodes itself, while smartvacuum() will do it by
SPI for simplicity.
2. autovacuum allocate its own MemoryContext. Does smartvacuum() need it as
well?
3. autovacuum takes new transaction. Why?

questions:
1. Why isn't there some function like pg_stat_get_dead_rows(). if there is,
smartvacuum() can be implemented by even pl/pgsql
2. autovacuum can actually conflict with VACUUM ANALYZE on another session?

Any answers and opinions are appreciated.
Regards,

Hitoshi Harada


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-22 13:08:28
Message-ID: 200610221508.28788.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hitoshi Harada wrote:
> I am trying to implement smartvacuum(), which do vacuum only tables
> having many dead rows, instead of autovacuum.

How is this different from what autovacuum does?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com>
To: "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-23 01:02:49
Message-ID: 200610230104.k9N14dPM054839@mbox31.po.2iij.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hi, Peter,

> How is this different from what autovacuum does?

My application needs to do vacuum by itself, while
autovacuum does it as daemon.
The database is updated so frequently that
normal vacuum costs too much and tables to be updated are
not so many as the whole database is vacuumed.
I want to use autovacuum except the feature of daemon,
but want to control when to vacuum and which table to vacuum.
So, nothing is different between autovacuum and smartvacuum(),
but former is daemon and later is user function.

c.f. I ran autovacuum before, and my batch script did vacuum while
autovacuum did one as well on the other session at the same time.
I found the vacuum analyze conflicts each other sometime...
so I want to control vacuum my self.
http://archives.postgresql.org/pgsql-bugs/2002-12/msg00198.php
http://archives.postgresql.org/pgsql-general/2004-05/msg00015.php

Regards,

Hitoshi Harada

> -----Original Message-----
> From: Peter Eisentraut [mailto:peter_e(at)gmx(dot)net]
> Sent: Sunday, October 22, 2006 10:08 PM
> To: Hitoshi Harada
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
>
> Hitoshi Harada wrote:
> > I am trying to implement smartvacuum(), which do vacuum only tables
> > having many dead rows, instead of autovacuum.
>
> How is this different from what autovacuum does?
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com>
Cc: "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-23 02:09:52
Message-ID: 10306.1161569392@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com> writes:
>> How is this different from what autovacuum does?

> My application needs to do vacuum by itself, while
> autovacuum does it as daemon.
> The database is updated so frequently that
> normal vacuum costs too much and tables to be updated are
> not so many as the whole database is vacuumed.
> I want to use autovacuum except the feature of daemon,
> but want to control when to vacuum and which table to vacuum.
> So, nothing is different between autovacuum and smartvacuum(),
> but former is daemon and later is user function.

This seems completely unconvincing. What are you going to do that
couldn't be done by autovacuum?

regards, tom lane


From: "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-23 02:18:39
Message-ID: 200610230220.k9N2KUeZ051790@mbox31.po.2iij.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Ok,

But my point is, autovacuum may corrupt with vacuum analyze command
on another session. My intention of smartvacuum() is based on this.
Any solution for this??

Regards,

Hitoshi Harada

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> Sent: Monday, October 23, 2006 11:10 AM
> To: Hitoshi Harada
> Cc: 'Peter Eisentraut'; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum
>
> "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com> writes:
> >> How is this different from what autovacuum does?
>
> > My application needs to do vacuum by itself, while
> > autovacuum does it as daemon.
> > The database is updated so frequently that
> > normal vacuum costs too much and tables to be updated are
> > not so many as the whole database is vacuumed.
> > I want to use autovacuum except the feature of daemon,
> > but want to control when to vacuum and which table to vacuum.
> > So, nothing is different between autovacuum and smartvacuum(),
> > but former is daemon and later is user function.
>
> This seems completely unconvincing. What are you going to do that
> couldn't be done by autovacuum?
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Hitoshi Harada <hitoshi_harada(at)forcia(dot)com>
Cc: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>, 'Peter Eisentraut' <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-23 18:36:28
Message-ID: 20061023183628.GR26892@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

If the decision to vacuum based on autovacuum criteria is good enough
for you then I think you should just focus on getting autovac to do what
you want it to do. Perhaps you just need to decrease the sleep time to a
few seconds, so that autovac will quickly detect when something needs to
be vacuumed.

The only case I can think of where autovac might not work as well as
smartvacuum would be if you had a lot of databases in the cluster, since
autovacuum will only vacuum one database at a time.

On Mon, Oct 23, 2006 at 11:18:39AM +0900, Hitoshi Harada wrote:
> Ok,
>
> But my point is, autovacuum may corrupt with vacuum analyze command
> on another session. My intention of smartvacuum() is based on this.
> Any solution for this??
>
> Regards,
>
>
> Hitoshi Harada
>
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org
> > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> > Sent: Monday, October 23, 2006 11:10 AM
> > To: Hitoshi Harada
> > Cc: 'Peter Eisentraut'; pgsql-hackers(at)postgresql(dot)org
> > Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum
> >
> > "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com> writes:
> > >> How is this different from what autovacuum does?
> >
> > > My application needs to do vacuum by itself, while
> > > autovacuum does it as daemon.
> > > The database is updated so frequently that
> > > normal vacuum costs too much and tables to be updated are
> > > not so many as the whole database is vacuumed.
> > > I want to use autovacuum except the feature of daemon,
> > > but want to control when to vacuum and which table to vacuum.
> > > So, nothing is different between autovacuum and smartvacuum(),
> > > but former is daemon and later is user function.
> >
> > This seems completely unconvincing. What are you going to do that
> > couldn't be done by autovacuum?
> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Hitoshi Harada <hitoshi_harada(at)forcia(dot)com>, "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-23 19:08:03
Message-ID: 27345.1161630483@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

"Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> The only case I can think of where autovac might not work as well as
> smartvacuum would be if you had a lot of databases in the cluster, since
> autovacuum will only vacuum one database at a time.

It's conceivable that it'd make sense to allow multiple autovac
processes running in parallel. (The infrastructure part of this is easy
enough, the hard part is keeping them from all deciding to vacuum the
same table.)

One reason we have not done that already is the thought that multiple
vacuum processes would suck too much I/O to be reasonable. Now you
could dial back their resource demands with the cost-delay settings,
but it's not clear that ten autovacs running at one-tenth speed are
better than one autovac using all the cycles you can spare. Usually
I think it's best if a vacuum transaction finishes as fast as it can.

In any case, these exact same concerns would apply to manual vacuums
or a combination of manual and auto vacuum.

regards, tom lane


From: Matthew O'Connor <matthew(at)zeut(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>, Hitoshi Harada <hitoshi_harada(at)forcia(dot)com>, 'Peter Eisentraut' <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-23 22:39:23
Message-ID: 453D449B.50404@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
>> The only case I can think of where autovac might not work as well as
>> smartvacuum would be if you had a lot of databases in the cluster, since
>> autovacuum will only vacuum one database at a time.
>
> It's conceivable that it'd make sense to allow multiple autovac
> processes running in parallel. (The infrastructure part of this is easy
> enough, the hard part is keeping them from all deciding to vacuum the
> same table.)
>
> One reason we have not done that already is the thought that multiple
> vacuum processes would suck too much I/O to be reasonable. Now you
> could dial back their resource demands with the cost-delay settings,
> but it's not clear that ten autovacs running at one-tenth speed are
> better than one autovac using all the cycles you can spare. Usually
> I think it's best if a vacuum transaction finishes as fast as it can.

I think this is one of the reasons table specific delay settings were
designed in from the beginning. I think the main use cases for multiple
vacuums at once are:
1) Vacuum per table space assuming each table space is on a different
drive with it's own I/O.
2) the frequently updated table that can't wait to be vacuumed while a
large table is being vacuumed. In this case if you set a system default
delay setting and set a more aggressive table specific delay setting for
your hot spot tables then multiple vacuums become a clear win. This is
an important case that I hope we handle soon. At this point it's one of
the main failings of the current autovacuum system.


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hitoshi Harada <hitoshi_harada(at)forcia(dot)com>, 'Peter Eisentraut' <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-23 23:08:55
Message-ID: 20061023230854.GC26892@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Oct 23, 2006 at 03:08:03PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > The only case I can think of where autovac might not work as well as
> > smartvacuum would be if you had a lot of databases in the cluster, since
> > autovacuum will only vacuum one database at a time.
>
> It's conceivable that it'd make sense to allow multiple autovac
> processes running in parallel. (The infrastructure part of this is easy
> enough, the hard part is keeping them from all deciding to vacuum the
> same table.)

It might be worth creating a generic framework that prevents multiple
vacuums from hitting a table at once, autovac or not.

> One reason we have not done that already is the thought that multiple
> vacuum processes would suck too much I/O to be reasonable. Now you
> could dial back their resource demands with the cost-delay settings,
> but it's not clear that ten autovacs running at one-tenth speed are
> better than one autovac using all the cycles you can spare. Usually
> I think it's best if a vacuum transaction finishes as fast as it can.

There's other things that would benefit from having some idea on what IO
resources are available. For example, having a separate bgwriter (or
reader) for each set of physical volumes. So a means of grouping
tablespaces wouldn't hurt.

> In any case, these exact same concerns would apply to manual vacuums
> or a combination of manual and auto vacuum.

Well, the advantage to manual vacuums is that you can tune things to
utilize multiple arrays...
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hitoshi Harada <hitoshi_harada(at)forcia(dot)com>, "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-24 01:22:50
Message-ID: 20061024012250.GA26593@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jim C. Nasby wrote:
> On Mon, Oct 23, 2006 at 03:08:03PM -0400, Tom Lane wrote:
> > "Jim C. Nasby" <jim(at)nasby(dot)net> writes:
> > > The only case I can think of where autovac might not work as well as
> > > smartvacuum would be if you had a lot of databases in the cluster, since
> > > autovacuum will only vacuum one database at a time.
> >
> > It's conceivable that it'd make sense to allow multiple autovac
> > processes running in parallel. (The infrastructure part of this is easy
> > enough, the hard part is keeping them from all deciding to vacuum the
> > same table.)
>
> It might be worth creating a generic framework that prevents multiple
> vacuums from hitting a table at once, autovac or not.

That one is easy, because vacuum gets a lock on the affected table that
conflicts with itself. The problem is that the second vacuum would
actually wait for the first to finish.

A naive idea is to use ConditionalLockAcquire, and if it fails just skip
the table.

> > One reason we have not done that already is the thought that multiple
> > vacuum processes would suck too much I/O to be reasonable. Now you
> > could dial back their resource demands with the cost-delay settings,
> > but it's not clear that ten autovacs running at one-tenth speed are
> > better than one autovac using all the cycles you can spare. Usually
> > I think it's best if a vacuum transaction finishes as fast as it can.

In the scenario where one table is huge and another is very small, it
can certainly be useful to vacuum the small table several times while
the huge one has only been vacuumed once. For that you definitively
need the ability to run parallel vacuums.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com>
To: "'Jim C(dot) Nasby'" <jim(at)nasby(dot)net>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Peter Eisentraut'" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] smartvacuum() instead of autovacuum
Date: 2006-10-24 09:52:01
Message-ID: 200610240953.k9O9rkk6040838@mbox31.po.2iij.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> If the decision to vacuum based on autovacuum criteria is good enough
> for you then I think you should just focus on getting autovac to do what
> you want it to do. Perhaps you just need to decrease the sleep time to a
> few seconds, so that autovac will quickly detect when something needs to
> be vacuumed.

Thanks, I'll do it.
My database is updated frequently all the day and
runs big building process a day.
Almost all the day autovac is ok but
in the big building process autovac annoys it,
so I wished there might be the way to order autovac to do its process.

Hitoshi Harada

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Jim C. Nasby
> Sent: Tuesday, October 24, 2006 3:36 AM
> To: Hitoshi Harada
> Cc: 'Tom Lane'; 'Peter Eisentraut'; pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum
>
> If the decision to vacuum based on autovacuum criteria is good enough
> for you then I think you should just focus on getting autovac to do what
> you want it to do. Perhaps you just need to decrease the sleep time to a
> few seconds, so that autovac will quickly detect when something needs to
> be vacuumed.
>
> The only case I can think of where autovac might not work as well as
> smartvacuum would be if you had a lot of databases in the cluster, since
> autovacuum will only vacuum one database at a time.
>
> On Mon, Oct 23, 2006 at 11:18:39AM +0900, Hitoshi Harada wrote:
> > Ok,
> >
> > But my point is, autovacuum may corrupt with vacuum analyze command
> > on another session. My intention of smartvacuum() is based on this.
> > Any solution for this??
> >
> > Regards,
> >
> >
> > Hitoshi Harada
> >
> > > -----Original Message-----
> > > From: pgsql-hackers-owner(at)postgresql(dot)org
> > > [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> > > Sent: Monday, October 23, 2006 11:10 AM
> > > To: Hitoshi Harada
> > > Cc: 'Peter Eisentraut'; pgsql-hackers(at)postgresql(dot)org
> > > Subject: Re: [HACKERS] [PATCHES] smartvacuum() instead of autovacuum
> > >
> > > "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com> writes:
> > > >> How is this different from what autovacuum does?
> > >
> > > > My application needs to do vacuum by itself, while
> > > > autovacuum does it as daemon.
> > > > The database is updated so frequently that
> > > > normal vacuum costs too much and tables to be updated are
> > > > not so many as the whole database is vacuumed.
> > > > I want to use autovacuum except the feature of daemon,
> > > > but want to control when to vacuum and which table to vacuum.
> > > > So, nothing is different between autovacuum and smartvacuum(),
> > > > but former is daemon and later is user function.
> > >
> > > This seems completely unconvincing. What are you going to do that
> > > couldn't be done by autovacuum?
> > >
> > > regards, tom lane
> > >
> > > ---------------------------(end of
broadcast)---------------------------
> > > TIP 5: don't forget to increase your free space map settings
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: don't forget to increase your free space map settings
> >
>
> --
> Jim Nasby jim(at)nasby(dot)net
> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq