Re: Autoanalyze and OldestXmin

Lists: pgsql-hackers
From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Autoanalyze and OldestXmin
Date: 2011-06-08 12:48:35
Message-ID: BANLkTikHq3mtk2G_LO-Hk33khPVM0q5caA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi All,

I was running some pgbench tests and observed this phenomenon. This might be
a known issue, but I thought its nevertheless worth mentioning.

Auto-analyze process grabs a MVCC snapshot. If it runs on a very large
table, it may take considerable time and would stop the OldestXmin from
advancing. During that time, if there are highly updated small tables, those
would bloat a lot. For example, in the attached log snippet (and the HEAD is
patched a bit to produce more information than what you would otherwise
see), for a scale factor of 50 and 50 clients:

branches and tellers tables, which had a stable size of around 65 and 90
pages respectively, bloat to 402 and 499 pages respectively when accounts
table is being analyzed. The accounts table analyze takes around 5 mins on
my decent server and the branches and tellers tables keep bloating during
that time. If these small tables are very actively accessed, vacuum may not
be able to even truncate them later, once OldestXmin advances at the end of
auto analyze.

I understand analyze needs snapshot to run index predicate functions, but is
there something we can do ? There is a PROC_IN_ANALYZE flag, but we don't
seem to be using it anywhere. Since acquire_sample_rows() returns palloced
tuples, can't we let OldestXmin advance after scanning a page by ignoring
procs with the flag set, just like we do for PROC_IN_VACUUM ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
head-vac-spinnet.txt text/plain 17.4 KB

From: Greg Stark <stark(at)mit(dot)edu>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-08 15:33:35
Message-ID: BANLkTi=J+7bNAsUr-obBuq6W_PBnfU0R=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 8, 2011 1:49 PM, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>
> Hi All,
> There is a PROC_IN_ANALYZE flag, but we don't seem to be using it
anywhere. Since acquire_sample_rows() returns palloced tuples, can't we let
OldestXmin advance after scanning a page by ignoring procs with the flag
set, just like we do for PROC_IN_VACUUM ?

I don't even think the pallocing of tuples is a necessary condition. The key
requirement is that this process will not access any other tables in this
snapshot. In which case we don't need to take it into account when vacuuming
other tables.

It's not safe to vacuum tuples from the table being analyzed because the
vacuum could get ahead of the analyze.

This is kind of like the other property it would be nice to know about
transactions: that they've locked all the tables they're going to lock. That
would be sufficient but overly strong test. It's possible to know that if
other tables are accessed they'll be with a brand new snapshot.


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-08 16:58:42
Message-ID: BANLkTikVZ9A=+Gimni3Y36A4MbXGuxSCBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 8, 2011 at 9:03 PM, Greg Stark <stark(at)mit(dot)edu> wrote:

>
> On Jun 8, 2011 1:49 PM, "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> >
> >
> > Hi All,
>
> > There is a PROC_IN_ANALYZE flag, but we don't seem to be using it
> anywhere. Since acquire_sample_rows() returns palloced tuples, can't we let
> OldestXmin advance after scanning a page by ignoring procs with the flag
> set, just like we do for PROC_IN_VACUUM ?
>
> I don't even think the pallocing of tuples is a necessary condition. The
> key requirement is that this process will not access any other tables in
> this snapshot. In which case we don't need to take it into account when
> vacuuming other tables.
>
>
I first thought that analyze and vacuum can not run concurrently on the same
table since they take a conflicting lock on the table. So even if we ignore
the analyze process while calculating the OldestXmin for vacuum, we should
be fine since we know they are working on different tables. But I see
analyze also acquires sample rows from the inherited tables with a
non-conflicting lock. I probably do not understand the analyze code well,
but is that the reason why we can't ignore analyze snapshot while
determining OldestXmin for vacuum ?

> It's not safe to vacuum tuples from the table being analyzed because the
> vacuum could get ahead of the analyze.
>

What can go wrong if that happens ? Is the worry that we might get stale
analyze result or are there more serious issues to deal with ?

> This is kind of like the other property it would be nice to know about
> transactions: that they've locked all the tables they're going to lock. That
> would be sufficient but overly strong test. It's possible to know that if
> other tables are accessed they'll be with a brand new snapshot.
>
I definitely do that understand this :-)

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-08 17:15:07
Message-ID: 14809.1307553307@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> I first thought that analyze and vacuum can not run concurrently on the same
> table since they take a conflicting lock on the table. So even if we ignore
> the analyze process while calculating the OldestXmin for vacuum, we should
> be fine since we know they are working on different tables. But I see
> analyze also acquires sample rows from the inherited tables with a
> non-conflicting lock. I probably do not understand the analyze code well,
> but is that the reason why we can't ignore analyze snapshot while
> determining OldestXmin for vacuum ?

The reason why we can't ignore that snapshot is that it's being set for
the use of user-defined functions, which might do practically anything.
They definitely could access tables other than the one under analysis.
(I believe that PostGIS does such things, for example --- it wants to
look at its auxiliary tables for metadata.)

Also keep in mind that we allow ANALYZE to be run inside a transaction
block, which might contain other operations sharing the same snapshot.

regards, tom lane


From: Jim Nasby <jim(at)nasby(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-08 22:39:08
Message-ID: 016FA578-A1C8-4232-A5BF-017C67A77EB3@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 8, 2011, at 10:33 AM, Greg Stark wrote:
> This is kind of like the other property it would be nice to know about transactions: that they've locked all the tables they're going to lock.
That sounds like something I've wanted for a very long time: the ability for a transaction to say exactly what tables it's going to access. Presumably disallowing it from taking out any more table locks (anything you do on a table needs at least a share lock, right?) would take care of that.

If we had that information vacuum could ignore the old snapshots on those tables, so long as it ensures that the vacuum process itself can't read anything from those tables (handling the functional index issue Tom mentioned).
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-09 06:20:06
Message-ID: BANLkTi=ejhonrxrs5s2P4M-NGG3FHCfqCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 8, 2011 at 10:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> writes:
> > I first thought that analyze and vacuum can not run concurrently on the
> same
> > table since they take a conflicting lock on the table. So even if we
> ignore
> > the analyze process while calculating the OldestXmin for vacuum, we
> should
> > be fine since we know they are working on different tables. But I see
> > analyze also acquires sample rows from the inherited tables with a
> > non-conflicting lock. I probably do not understand the analyze code well,
> > but is that the reason why we can't ignore analyze snapshot while
> > determining OldestXmin for vacuum ?
>
> The reason why we can't ignore that snapshot is that it's being set for
> the use of user-defined functions, which might do practically anything.
> They definitely could access tables other than the one under analysis.
> (I believe that PostGIS does such things, for example --- it wants to
> look at its auxiliary tables for metadata.)
>
> Also keep in mind that we allow ANALYZE to be run inside a transaction
> block, which might contain other operations sharing the same snapshot.
>
>
Ah, I see. Would there will be benefits if we can do some special handling
for cases where we know that ANALYZE is running outside a transaction block
and that its not going to invoke any user-defined functions ? If user is
running ANALYZE inside a transaction block, he is probably already aware and
ready to handle long-running transaction. But running them under the covers
as part of auto-analyze does not see quite right. The pgbench test already
shows the severe bloat that a long running analyze may cause for small
tables and many wasteful vacuum runs on those tables.

Another idea would be to split the ANALYZE into multiple small transactions,
each taking a new snapshot. That might result in bad statistics if the table
is undergoing huge change, but in that case, the stats will be outdated soon
anyways if we run with a old snapshot. I understand there could be issues
like counting the same tuple twice or more, but would that be a common case
to worry about ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-09 09:26:02
Message-ID: BANLkTik9XipfvpCbaO2e4cH-XsYV2gSDDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 9, 2011 at 11:50 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>wrote:

>
>
>
> Ah, I see. Would there will be benefits if we can do some special handling
> for cases where we know that ANALYZE is running outside a transaction block
> and that its not going to invoke any user-defined functions ? If user is
> running ANALYZE inside a transaction block, he is probably already aware and
> ready to handle long-running transaction. But running them under the covers
> as part of auto-analyze does not see quite right. The pgbench test already
> shows the severe bloat that a long running analyze may cause for small
> tables and many wasteful vacuum runs on those tables.
>
> Another idea would be to split the ANALYZE into multiple small
> transactions, each taking a new snapshot. That might result in bad
> statistics if the table is undergoing huge change, but in that case, the
> stats will be outdated soon anyways if we run with a old snapshot. I
> understand there could be issues like counting the same tuple twice or more,
> but would that be a common case to worry about ?
>
>
FWIW I searched the archives again and seems like ITAGAKI Takahiro
complained about the same issue in the past and had some ideas (including
splitting one long transaction). We did not conclude the discussions that
time, but I hope we make some progress this time unless we are certain that
there are no low-hanging fruits here.

http://archives.postgresql.org/pgsql-hackers/2008-02/msg00574.php

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-09 14:36:59
Message-ID: BANLkTim4Mod9UFDTDwDKPqbGCMHFJ22ZTQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 9, 2011 at 2:20 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> Ah, I see. Would there will be benefits if we can do some special handling
> for cases where we know that ANALYZE is running outside a transaction block
> and that its not going to invoke any user-defined functions ?

We'd have to distinguish between user-defined typanalyze functions and
system-defined typanalyze functions, which doesn't seem to appealing,
or robust.

> If user is
> running ANALYZE inside a transaction block, he is probably already aware and
> ready to handle long-running transaction. But running them under the covers
> as part of auto-analyze does not see quite right. The pgbench test already
> shows the severe bloat that a long running analyze may cause for small
> tables and many wasteful vacuum runs on those tables.
> Another idea would be to split the ANALYZE into multiple small transactions,
> each taking a new snapshot. That might result in bad statistics if the table
> is undergoing huge change, but in that case, the stats will be outdated soon
> anyways if we run with a old snapshot. I understand there could be issues
> like counting the same tuple twice or more, but would that be a common case
> to worry about ?

I am wondering if we shouldn't be asking ourselves a different
question: why is ANALYZE running long enough on your tables for this
to become an issue? How long is it taking?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-09 14:52:59
Message-ID: 92E3A92E-D555-46DC-80B2-01590E59063A@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> I am wondering if we shouldn't be asking ourselves a different
> question: why is ANALYZE running long enough on your tables for this
> to become an issue? How long is it taking?
>

The log file attached in the first post has the details; it's taking around 5 mins for the accounts table with 50 scale factor and 50 clients

Thanks,
Pavan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-09 14:58:30
Message-ID: BANLkTi=-+Q3Q6iNLhCy75E2GS1P3i1uf8g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 9, 2011 at 10:52 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>> I am wondering if we shouldn't be asking ourselves a different
>> question: why is ANALYZE running long enough on your tables for this
>> to become an issue?  How long is it taking?
>
> The log file attached in the first post has the details; it's taking around 5 mins for the accounts table with 50 scale factor and 50 clients

Wow, that's slow. Still, what if the user were doing a transaction of
comparable size? It's not like ANALYZE is doing a gigantic amount of
work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-09 14:59:54
Message-ID: 24026.1307631594@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jun 9, 2011 at 10:52 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>>> I am wondering if we shouldn't be asking ourselves a different
>>> question: why is ANALYZE running long enough on your tables for this
>>> to become an issue? How long is it taking?

>> The log file attached in the first post has the details; it's taking around 5 mins for the accounts table with 50 scale factor and 50 clients

> Wow, that's slow. Still, what if the user were doing a transaction of
> comparable size? It's not like ANALYZE is doing a gigantic amount of
> work.

I wonder what vacuum cost delay settings are in use ...

regards, tom lane


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autoanalyze and OldestXmin
Date: 2011-06-09 15:05:52
Message-ID: FD21FC80-4C2D-47D8-903D-A2E9AD9C3945@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09-Jun-2011, at 8:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jun 9, 2011 at 10:52 AM, Pavan Deolasee
>> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>>>> I am wondering if we shouldn't be asking ourselves a different
>>>> question: why is ANALYZE running long enough on your tables for this
>>>> to become an issue? How long is it taking?
>
>>> The log file attached in the first post has the details; it's taking around 5 mins for the accounts table with 50 scale factor and 50 clients
>
>> Wow, that's slow. Still, what if the user were doing a transaction of
>> comparable size? It's not like ANALYZE is doing a gigantic amount of
>> work.
>
> I wonder what vacuum cost delay settings are in use ...
>

Default settings with 512Mb shared buffers

Thanks.
Pavan