ANALYZE to be ignored by VACUUM

Lists: pgsql-hackers
From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: ANALYZE to be ignored by VACUUM
Date: 2008-02-15 07:59:44
Message-ID: 20080215152826.A861.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

When there are a heavily updated table and a large table at the same time,
ANALYZE against the large table disturbs other autovacuums and HOT updates.
In my workload, ANALYZE takes long time (1min at statistics_target = 10,
and 5min at 100), but the updated table needs to be vacuumed every 30 seconds
because seqscans run on the table repeatedly.

ANALYZE is a "transaction". As long as long transactions are alive,
VACUUM cannot remove dead tuples deleted after those transaction began.
HOT also cannot work under long transactions. We will be happy if VACUUM
can get along with long transactions, but it requires some kinds of complex
managements of vacuum horizon. I have no idea for it...

So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored
by VACUUM. It is just same as VACUUM that has already been ignored by other
VACUUMs since version 8.2.

My proposal is splitting ANALYZEs with use_own_xacts (by VACUUM ANALYZE,
autovacuum or database-wide analyze) to two transactions:
T1: acquire_sample_rows()
T2: compute_stats() and update_attstats()
and set PROC_IN_VACUUM during T1.
T1 takes long time because read pages to sample rows.
T2 is relatively short because stats are calculated in memory, without i/o.
T2 is needed for consistent snapshot because user-defined functions
in expression indexes might use it.

Is it an acceptable approach? Or am I missing something?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ANALYZE to be ignored by VACUUM
Date: 2008-02-15 15:28:15
Message-ID: 2987.1203089295@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> In my workload, ANALYZE takes long time (1min at statistics_target = 10,
> and 5min at 100), but the updated table needs to be vacuumed every 30 seconds
> because seqscans run on the table repeatedly.

There is something *seriously* wrong with that. If vacuum can complete
in under 30 seconds, how can analyze take a minute? (I'm also wondering
whether you'll still need such frantic vacuuming with HOT...)

> So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored
> by VACUUM.

I think we need to understand what the real problem is with your test
case. This proposal seems very messy/ugly to me, and I'm unconvinced
that it solves anything.

regards, tom lane


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ANALYZE to be ignored by VACUUM
Date: 2008-02-19 07:31:20
Message-ID: 20080219160731.6831.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> > In my workload, ANALYZE takes long time (1min at statistics_target = 10,
> > and 5min at 100), but the updated table needs to be vacuumed every 30 seconds
> > because seqscans run on the table repeatedly.
>
> There is something *seriously* wrong with that. If vacuum can complete
> in under 30 seconds, how can analyze take a minute? (I'm also wondering
> whether you'll still need such frantic vacuuming with HOT...)

There are two tables here:
[S] A small table, that is frequently updated and seqscan-ed
[L] A large table, that takes a long time to be analyzed

The table [S] should be vacuumed every 30 seconds, because dead tuples
affects the performance of seqscan seriously. HOT and autovacuum are
very useful here *unless* long transactions begins.
Analyzing [L] takes 1-5 minutes and both HOT and vacuums don't work
during it. I want to use statistics_target = 100 at heart for more
accurate statistics, but I'm using 10 instead because of avoiding
long transactions by analyze.

Basically, the above is based on avoiding needless long transactions.
Aside from ANALYZE, pg_start_backup() is also a long transactional
command. It takes checkpoint_timeout * checkpoint_completion_target
(2.5- min.) at worst. Users could avoid their own long transactions,
but have no choice but to use those provided maintenance commands.

> > So, I'm targeting only ANALZYE for now by changing ANALYZE to be ignored
> > by VACUUM.
>
> I think we need to understand what the real problem is with your test
> case. This proposal seems very messy/ugly to me, and I'm unconvinced
> that it solves anything.

I think there are some direct or indirect solutions:

1. VACUUM removes recently dead tuples under some circumstances.
For example, tuples updated twice after a long transaction begins.
The oldest tuple can be seen by the old long transaction and
the newest can be seen new transactions. However, the intermediate
tuple is invisible all transactions.

2. ANALYZE don't disturb vacuuming of other tables. (my first proposal)
We know ANALYZE don't touch other tables during sampling phases.
We can treat analyzing transactions as same as PROC_IN_VACUUM xacts.
The same can be said for pg_start_backup; non-transactinal starting
backup command might be better.

3. Recover density of tuples; i.e, auto-CLUSTER.
If the performance recovers after long transactions, the problem
will not be so serious. It would be better that autovacuum invokes
CLUSTER if required and we could run CLUSTER concurrently.

4. ANALYZE finishes in a short time.
It is ok that VACUUM takes a long time because it is not a transaction,
but ANALYZE should not. It requres cleverer statistics algorithm.
Sampling factor 10 is not enough for pg_stats.n_distinct. We seems to
estimate n_distinct too low for clustered (ordered) tables.
There might be a matter of research in calculation of n_distinct.
Also, this cannot resolve the problem in pg_start_backup.

1 or 3 might be more universal approaches, but I think 2 will be
an independent improvement from them.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE to be ignored by VACUUM
Date: 2008-02-19 08:56:04
Message-ID: 87zltxz5nf.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:

> 4. ANALYZE finishes in a short time.
> It is ok that VACUUM takes a long time because it is not a transaction,
> but ANALYZE should not. It requres cleverer statistics algorithm.
> Sampling factor 10 is not enough for pg_stats.n_distinct. We seems to
> estimate n_distinct too low for clustered (ordered) tables.

Unfortunately no constant size sample is going to be enough for reliable
n_distinct estimates. To estimate n_distinct you really have to see a
percentage of the table, and to get good estimates that percentage has to be
fairly large.

There was a paper with a nice algorithm posted a while back which required
only constant memory but it depended on scanning the entire table. I think to
do n_distinct estimates we'll need some statistics which are either gathered
opportunistically whenever a seqscan happens or maintained by an index.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE to be ignored by VACUUM
Date: 2008-02-20 04:17:45
Message-ID: 20080220130529.6387.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Gregory Stark <stark(at)enterprisedb(dot)com> wrote:

> > 4. ANALYZE finishes in a short time.
> There was a paper with a nice algorithm posted a while back which required
> only constant memory but it depended on scanning the entire table. I think to
> do n_distinct estimates we'll need some statistics which are either gathered
> opportunistically whenever a seqscan happens or maintained by an index.

VACUUM would be another good timing for the alogrithm, because it does
a seqscan. If we do so, we need to separate an analyzing transaction
into sampling and updating-stats transactions to keep vacuums as
ignorable transactions.

However, VACUUM will not do a seqscan when we have Dead Space Map or
Segment Visibility Map. We will need incremental statistics updating
if reliable n_distinct estimation requires many samples.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ANALYZE to be ignored by VACUUM
Date: 2008-02-20 16:45:14
Message-ID: 758d5e7f0802200845v7b5dde22v1bd82c49755a1a53@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Feb 19, 2008 8:31 AM, ITAGAKI Takahiro
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> > ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> > > In my workload, ANALYZE takes long time (1min at statistics_target = 10,
> > > and 5min at 100), but the updated table needs to be vacuumed every 30 seconds
> > > because seqscans run on the table repeatedly.
> >
> > There is something *seriously* wrong with that. If vacuum can complete
> > in under 30 seconds, how can analyze take a minute? (I'm also wondering
> > whether you'll still need such frantic vacuuming with HOT...)
>
> There are two tables here:
> [S] A small table, that is frequently updated and seqscan-ed
> [L] A large table, that takes a long time to be analyzed
>
> The table [S] should be vacuumed every 30 seconds, because dead tuples
> affects the performance of seqscan seriously. HOT and autovacuum are
> very useful here *unless* long transactions begins.
> Analyzing [L] takes 1-5 minutes and both HOT and vacuums don't work
> during it. I want to use statistics_target = 100 at heart for more
> accurate statistics, but I'm using 10 instead because of avoiding
> long transactions by analyze.

I am sure the idea is not original, yet still I would like to know how hard
would it be to support local (per table) oldest visible XIDs.

I mean, when transaction start you need to keep all tuples with xmin >=
oldest_xid in all tables, because who knows what table will that transaction
like to touch.

But then again, there is relatively large list of cases when we don't need
to hold vacuum on _all_ relations. These include:

SELECTs in auto-commit mode -- provided the SELECT is not something
fancy (not immutable PL-functions), we just need to keep a snapshot of
affected tables.

DMLs in auto-commit mode -- provided no PL-functions or triggers are in
effect.

WITH HOLD CURSORS. Yes, I know, WITH HOLD cursor on first COMMIT
will create a copy of rows to be returned (which can take a looong time in
some cases), but perhaps it could be possible to just "lock" the table from
vacuuming and skip the temporary store.

And lots of other, when done in auto-commit. Like ALTER TABLEs, CREATE
TABLE AS SELECT, COPY, etc...

I am sure that such an idea isn't original. What are the main obstacles
in making it happen except "time&money"? :)

Regards,
Dawid


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ANALYZE to be ignored by VACUUM
Date: 2008-02-21 04:35:23
Message-ID: 20080221132641.B2BD.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Dawid Kuroczko" <qnex42(at)gmail(dot)com> wrote:

> I am sure the idea is not original, yet still I would like to know how hard
> would it be to support local (per table) oldest visible XIDs.
>
> I mean, when transaction start you need to keep all tuples with xmin >=
> oldest_xid in all tables, because who knows what table will that transaction
> like to touch.

Per-table oldest XID management sounds good! You mean transactions
that touch no tables does not affect vacuums at all, right?
If so, the solution can resolve pg_start_backup problem, too.

I feel it is enough for standard maintenance commands.
Another solution might need for user defined long transactions, though.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center