Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)

Lists: pgsql-hackers
From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-01 13:15:08
Message-ID: CABOikdM_Sk05jTFa+v0NQiKTXLRsBcnH7zteqeMfttnbUgMXYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 6:10 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>wrote:

>
>
>
>>
>> 2012-12-05 00:44:23 EET LOG: automatic analyze of table
>> "fleet.fleet.vehicle_position" system usage: CPU 4.46s/0.61u sec elapsed
>> 465.09 sec
>>
>
> This is the interesting piece of information. So its the auto analyze
> thats causing all
> the IO activity. That explains why it was a read only IO that we noticed
> earlier. Whats
> happening here, and something that changed from 8.4 to 9.1, is that
> whenever the parent
> table is analyzed, the child tables are also automatically analyzed. I
> don't remember the
> rational for doing this change, but in your case the analyze on the parent
> table itself is
> quite useless because even though you inserting a large number of new
> tuples, you are
> also immediately deleting them. I don't want to comment on the design
> aspect of that,
> but you should be able to fix this problem by disabling auto-analyze on
> the parent table.
>
> Having said that, I don't see an easy way to just disable auto-analyze on
> a table. You can
> run ALTER TABLE foo SET (autovacuum_enabled = false), but that would also
> disable
> auto-vacuum, which you certainly don't want to do because the parent table
> would just
> keep growing.
>
> You can set autovacuum_analyze_threshold to an artificially high value
> to mitigate the
> problem and reduce the frequency of auto-analyze on the table or see if
> you can completely
> avoid insert/delete on the parent table.
>
> ALTER TABLE vehicle_position SET (autovacuum_analyze_threshold =
> 2000000000);
>
>
While looking at this particular case on -general, I realized that there is
no way to *only* disable auto-analyze on a table. While one can cheat like
what I suggested to the OP by setting threshold very high, I think it will
be useful to be able to just off analyze. In this particular case, the OP
is inserting and then deleting the same rows from the parent table, thus
keeping it almost empty. Of course, he would want to run auto-vacuum on the
table to remove the dead rows. Usually auto-analyze would have returned
quite fast, especially because we vacuum a table first and then analyze it.
But in this case, since the table is a parent of a number of large child
tables, we end up analyzing the child tables too, which takes significantly
longer time and is quite unnecessary because in this case the activity on
the parent table must not have changed any stats for the child tables.

A new reloption such as autovacuum_analyze_enabled is what we need.

Comments ?

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-01 15:34:33
Message-ID: 14670.1359732873@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> While looking at this particular case on -general, I realized that there is
> no way to *only* disable auto-analyze on a table. While one can cheat like
> what I suggested to the OP by setting threshold very high, I think it will
> be useful to be able to just off analyze. In this particular case, the OP
> is inserting and then deleting the same rows from the parent table, thus
> keeping it almost empty. Of course, he would want to run auto-vacuum on the
> table to remove the dead rows. Usually auto-analyze would have returned
> quite fast, especially because we vacuum a table first and then analyze it.
> But in this case, since the table is a parent of a number of large child
> tables, we end up analyzing the child tables too, which takes significantly
> longer time and is quite unnecessary because in this case the activity on
> the parent table must not have changed any stats for the child tables.

> A new reloption such as autovacuum_analyze_enabled is what we need.

This seems to me to be a wart that doesn't fix the actual problem ---
the actual problem is to make the autovac daemon smarter about when an
inheritance-tree ANALYZE pass is needed. That should be done somehow
based on the total row churn across the parent + children. Looking
at the parent only, as we do now, can result in analyzing too often
(the OP's case) or too seldom (the much more common case). A manual
"off" switch fixes only the less common case, and requires user
intervention that we'd be better off without.

regards, tom lane


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-01 15:37:21
Message-ID: 20130201153721.GE4918@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee escribió:

> While looking at this particular case on -general, I realized that there is
> no way to *only* disable auto-analyze on a table. While one can cheat like
> what I suggested to the OP by setting threshold very high, I think it will
> be useful to be able to just off analyze. In this particular case, the OP
> is inserting and then deleting the same rows from the parent table, thus
> keeping it almost empty. Of course, he would want to run auto-vacuum on the
> table to remove the dead rows. Usually auto-analyze would have returned
> quite fast, especially because we vacuum a table first and then analyze it.
> But in this case, since the table is a parent of a number of large child
> tables, we end up analyzing the child tables too, which takes significantly
> longer time and is quite unnecessary because in this case the activity on
> the parent table must not have changed any stats for the child tables.
>
> A new reloption such as autovacuum_analyze_enabled is what we need.

I was thinking in this option just three days ago, so yeah.

I think we also want an option to turn off just vacuum.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-01 15:54:33
Message-ID: CABOikdP97uMQ95q16qScFu0PYk6H5oHHyKVx8=t60mCW=pzg4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 9:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:

>> A new reloption such as autovacuum_analyze_enabled is what we need.
>
> This seems to me to be a wart that doesn't fix the actual problem ---

IMHO this case is just an example, but I'm sure there would be similar
such examples which may not involve inheritance. For example, say user
has a very large table which is updated very frequently but not in a
way that his query plans will be affected. The user may want to turn
auto analyze in such cases. And given that we allow the user to
control all other parameters, I don't understand why we would not let
him turn it off completely.

There is another problem that I noticed while looking at this case.
The analyze took close to 500sec on a fairly good hardware (40GB RAM,
10K rpm disks on RAID10) because many large child tables were scanned
at once. We analyze all of them in a single transaction. This long
running transaction will cause a lot of bloat for heavily updated
tables since HOT will fail to keep up. I wonder if we should set up
the child tables in the tableoid_list just like we do for toast tables
so that each table is analyzed in its own transaction. This is also
important because partitioning will typically involve very large
tables.

Of course, if we could ever run analyze on a single table in multiple
smaller transactions, that will be even better. But I'm not sure if
thats feasible.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-01 16:16:38
Message-ID: CABOikdOCm3tJDCkgcfX_enxNXRa1e56250sikhb6atCVcOgF_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 9:07 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Pavan Deolasee escribió:
>
>>
>> A new reloption such as autovacuum_analyze_enabled is what we need.
>
> I was thinking in this option just three days ago, so yeah.
>
> I think we also want an option to turn off just vacuum.
>

+1. I think that will be useful too in some situations, especially
when user wants to let autovacuum take care of some tables and
manually vacuum the rest.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-01 17:23:14
Message-ID: 20130201172314.GA2371@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 12:37:21PM -0300, Alvaro Herrera wrote:
> Pavan Deolasee escribió:
>
> > While looking at this particular case on -general, I realized that there is
> > no way to *only* disable auto-analyze on a table. While one can cheat like
> > what I suggested to the OP by setting threshold very high, I think it will
> > be useful to be able to just off analyze. In this particular case, the OP
> > is inserting and then deleting the same rows from the parent table, thus
> > keeping it almost empty. Of course, he would want to run auto-vacuum on the
> > table to remove the dead rows. Usually auto-analyze would have returned
> > quite fast, especially because we vacuum a table first and then analyze it.
> > But in this case, since the table is a parent of a number of large child
> > tables, we end up analyzing the child tables too, which takes significantly
> > longer time and is quite unnecessary because in this case the activity on
> > the parent table must not have changed any stats for the child tables.
> >
> > A new reloption such as autovacuum_analyze_enabled is what we need.
>
> I was thinking in this option just three days ago, so yeah.
>
> I think we also want an option to turn off just vacuum.

Are these TODO items?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-01 17:33:05
Message-ID: CABOikdPxbrRytmXGXXmi52aJgWpT5=Uv5RBkpGw1VXjkto+oOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 10:53 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Fri, Feb 1, 2013 at 12:37:21PM -0300, Alvaro Herrera wrote:
>
>> > A new reloption such as autovacuum_analyze_enabled is what we need.
>>
>> I was thinking in this option just three days ago, so yeah.
>>
>> I think we also want an option to turn off just vacuum.
>
> Are these TODO items?
>

I can write a patch in next couple of days if we are willing to accept
for this release. I think it should be fairly easy and non-intrusive.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-02 15:19:49
Message-ID: CABrmO8rP7b21-zQfymb7QUTioiJdoNE5Q+ipX-W_EwHdRiYtCQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 5:54 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>wrote:

> There is another problem that I noticed while looking at this case.
> The analyze took close to 500sec on a fairly good hardware (40GB RAM,
> 10K rpm disks on RAID10) because many large child tables were scanned
> at once.
>

Just a small correction to get a good benchmark

The ~500 sec analyze time was on a test VM running on a i5 2.4 Ghz with 2
dedicate cores, 4 GB of RAM and stored on a notebook HDD. The partitioned
data was about 80GB.

On our production environment (described by Pavan) it took ~90 second for
~55GB of data in 8 child/partition tables (we stopped the import of
partitioned data when we discovered this issue - we COPYed and TRUNCATEd
partitions to speed up upgrade procedure from 8.4 to 9.1 by
pg_dump/pg_restore).

Vlad


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-04 15:35:17
Message-ID: CA+Tgmoad44xYo7MBdN2u3TywHz+Bxih6m3SwRU5JN0-kvLgzww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 12:33 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> I can write a patch in next couple of days if we are willing to accept
> for this release. I think it should be fairly easy and non-intrusive.

I think it's too late to consider this for 9.3, but I think we should
entertain it for 9.4.

The biggest problem I see is that the naming of the new reloptions
might end up being something kind of unintuitive, like
autovacuum_vacuum_enabled and autovacuum_analyze_enabled. You need a
degree in PostgreSQLology to understand what that means, but I haven't
got a better idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-05 08:07:19
Message-ID: CABOikdMpLKKWcBrW88=Yf5ujasX7jJFTYyvn3x2O72j8bTMRuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 9:05 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Feb 1, 2013 at 12:33 PM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> I can write a patch in next couple of days if we are willing to accept
>> for this release. I think it should be fairly easy and non-intrusive.
>
> I think it's too late to consider this for 9.3, but I think we should
> entertain it for 9.4.
>

Its going to be fairly non

> The biggest problem I see is that the naming of the new reloptions
> might end up being something kind of unintuitive, like
> autovacuum_vacuum_enabled and autovacuum_analyze_enabled. You need a
> degree in PostgreSQLology to understand what that means, but I haven't
> got a better idea.
>

Yeah, I also thought of those two names. May be autovacuum should have
been called autovacanalyze or something like that. But that's too
late. May be someday we overhaul the maintenance activities and call
it bgmaintainer ? Or have two different threads to do autovacuum and
autoanalyze ?

For now, I can't think of anything better than
autovacuum_vacuum_enabled and autovacuum_analyze_enabled.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-05 08:11:05
Message-ID: CABOikdPnqVe5v9ZZKROwNcLE3S6Jx-KS-mi8D5ovJxNaM6uVFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 10:53 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>
> Are these TODO items?
>

BTW, there are couple of TODOs.

1. Analyze should be done based on the total row churn across the
parent + children. Looking at the parent only, as we do now, can
result in analyzing too often or too seldom.

2. Auto-analyze should analyze parent and each child table in a
separate transaction.

Thanks,
Pavan

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Vlad Bailescu <vlad(at)mojitosoftware(dot)com>
Subject: Re: Turning auto-analyze off (was Re: [GENERAL] Unusually high IO for autovacuum worker)
Date: 2013-02-05 08:23:02
Message-ID: 20130205082302.GB32520@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 5, 2013 at 01:41:05PM +0530, Pavan Deolasee wrote:
> On Fri, Feb 1, 2013 at 10:53 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >
> >
> > Are these TODO items?
> >
>
> BTW, there are couple of TODOs.
>
> 1. Analyze should be done based on the total row churn across the
> parent + children. Looking at the parent only, as we do now, can
> result in analyzing too often or too seldom.
>
> 2. Auto-analyze should analyze parent and each child table in a
> separate transaction.

I added a link to this thread on the TODO list under:

Improve autovacuum tuning

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +