Re: VACUUM ANALYZE is faster than ANALYZE?

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-21 14:00:00
Message-ID: CAFj8pRAhJzS4xVTrB7CkuTE1gt2fgtaBEkiKo891poRUdTQYTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
expected so ANALYZE should be faster then VACUUM ANALYZE.

But is not true. Why?

Regards

Pavel Stehule


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 12:59:39
Message-ID: CA+Tgmob=6Ve2=LoirsToHRdfD5AAJOW4_ZRFp-qG0LR_AGMRtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
> expected so ANALYZE should be faster then VACUUM ANALYZE.
>
> But is not true. Why?

I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in
general, because VACUUM has to scan the whole table, and ANALYZE only
a fixed-size subset of its pages. Not sure what's happening in your
particular case...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 13:13:34
Message-ID: CAP-rdTZRDc4P4Zg=+U-Ct68S5V6ri=EeFv3HSVc8TSVgRQMA=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/2/22 Robert Haas <robertmhaas(at)gmail(dot)com>:

> On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
>> expected so ANALYZE should be faster then VACUUM ANALYZE.
>>
>> But is not true. Why?
>
> I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in
> general, because VACUUM has to scan the whole table, and ANALYZE only
> a fixed-size subset of its pages.

It sounds like you just said the opposite of what you wanted to say.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 13:33:07
Message-ID: CA+TgmoZmf36SNXzsVUbKRXq6bo-+w9M0yuAEJ7gJkppg_g5b=g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 22, 2012 at 8:13 AM, Nicolas Barbier
<nicolas(dot)barbier(at)gmail(dot)com> wrote:
> 2012/2/22 Robert Haas <robertmhaas(at)gmail(dot)com>:
>
>> On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>
>>> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
>>> expected so ANALYZE should be faster then VACUUM ANALYZE.
>>>
>>> But is not true. Why?
>>
>> I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in
>> general, because VACUUM has to scan the whole table, and ANALYZE only
>> a fixed-size subset of its pages.
>
> It sounds like you just said the opposite of what you wanted to say.

Yeah, I did. Woops. Let me try that again:

ANALYZE should be faster; reads only some pages.

VACUUM ANALYZE should be slower; reads them all.

Dunno why Pavel's seeing the opposite without more info.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 13:38:40
Message-ID: CA+U5nMJTjOb7+qJsrRaTjZsWPOAw26_iXD5cqPm-_vz5nQGxDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
> expected so ANALYZE should be faster then VACUUM ANALYZE.

VACUUM ANALYZE scans the whole table sequentially.

ANALYZE accesses a random sample of data blocks. Random access is
slower than sequential access, so at some threshold of sample size and
sequential/random I/O speed ratio ANALYZE could become slower.

So it depends upon the hardware and the setting of stats_target.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 15:29:56
Message-ID: 17867.1329924596@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
>> expected so ANALYZE should be faster then VACUUM ANALYZE.

> VACUUM ANALYZE scans the whole table sequentially.

> ANALYZE accesses a random sample of data blocks. Random access is
> slower than sequential access, so at some threshold of sample size and
> sequential/random I/O speed ratio ANALYZE could become slower.

That analysis is entirely wrong. In the first place, although ANALYZE
doesn't read all the blocks, what it does read it reads in block number
order. So it's not like there are "random" seeks all over the disk that
would not need to happen anyway. In the second place, VACUUM ANALYZE
consists of two separate passes, VACUUM and then ANALYZE, and the second
pass is going to be "random" I/O by your definition no matter what.

If the filesystem is hugely biased towards sequential I/O for some
reason, and the VACUUM scan causes the whole table to become resident in
RAM where ANALYZE can read it "for free", then I guess it might be
possible to arrive at Pavel's result. But it would be an awfully narrow
corner case. I cannot believe that his statement is true in general,
or even for a noticeably large fraction of cases.

regards, tom lane


From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 15:35:51
Message-ID: 20120222153551.GN21114@aart.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 22, 2012 at 10:29:56AM -0500, Tom Lane wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> > On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> >> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
> >> expected so ANALYZE should be faster then VACUUM ANALYZE.
>
> > VACUUM ANALYZE scans the whole table sequentially.
>
> > ANALYZE accesses a random sample of data blocks. Random access is
> > slower than sequential access, so at some threshold of sample size and
> > sequential/random I/O speed ratio ANALYZE could become slower.
>
> That analysis is entirely wrong. In the first place, although ANALYZE
> doesn't read all the blocks, what it does read it reads in block number
> order. So it's not like there are "random" seeks all over the disk that
> would not need to happen anyway. In the second place, VACUUM ANALYZE
> consists of two separate passes, VACUUM and then ANALYZE, and the second
> pass is going to be "random" I/O by your definition no matter what.
>
> If the filesystem is hugely biased towards sequential I/O for some
> reason, and the VACUUM scan causes the whole table to become resident in
> RAM where ANALYZE can read it "for free", then I guess it might be
> possible to arrive at Pavel's result. But it would be an awfully narrow
> corner case. I cannot believe that his statement is true in general,
> or even for a noticeably large fraction of cases.
>
> regards, tom lane
>

Wouldn't a full sequential scan trigger the kernel read-ahead, which
might not trigger for the analyze block reads, even though they are
in order? That could account for the observation.

Regards,
Ken


From: Jeroen Vermeulen <jtv(at)xs4all(dot)nl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 17:54:41
Message-ID: 4F452BE1.10602@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2012-02-22 16:29, Tom Lane wrote:

(Snip context)

> VACUUM ANALYZE
> consists of two separate passes, VACUUM and then ANALYZE, and the second
> pass is going to be "random" I/O by your definition no matter what.

I don't suppose there's a case where the VACUUM (1) gets to delete lots
and lots of rows that then don't need ANALYZE'ing, and (2) can do so
without actually touching all those pages?

Jeroen


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 18:36:38
Message-ID: CAFj8pRBE1XM0pUyhqXTjo2G5r5ay_U-zEBx_nEAfwwXKYHAhow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/2/22 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Feb 22, 2012 at 8:13 AM, Nicolas Barbier
> <nicolas(dot)barbier(at)gmail(dot)com> wrote:
>> 2012/2/22 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>
>>> On Tue, Feb 21, 2012 at 9:00 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>>
>>>> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
>>>> expected so ANALYZE should be faster then VACUUM ANALYZE.
>>>>
>>>> But is not true. Why?
>>>
>>> I'm pretty sure that VACUUM ANALYZE *will* be faster than ANALYZE in
>>> general, because VACUUM has to scan the whole table, and ANALYZE only
>>> a fixed-size subset of its pages.
>>
>> It sounds like you just said the opposite of what you wanted to say.
>
> Yeah, I did.  Woops.  Let me try that again:
>
> ANALYZE should be faster; reads only some pages.
>
> VACUUM ANALYZE should be slower; reads them all.
>
> Dunno why Pavel's seeing the opposite without more info.

usual pattern in our application is

create table xx1 as select ....
analyze xx1
create table xx2 as select .... from xx1, ....
analyze xx2
create table xx3 as select ... from xx3, ....
analyze xx3
create table xx4 as select ... from xx1, ...

tables xx** are use as cache.

so we have to refresh statistic early.

in this situation - and I found so in this case VACUUM ANALYZE is
faster (30%) than ANALYZE. Size of xx** is usually between 500Kb and
8Kb

This is not usual pattern for OLTP - Application is strictly OLAP.

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Nicolas Barbier" <nicolas(dot)barbier(at)gmail(dot)com>, "PostgreSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 18:57:12
Message-ID: 4F44E6280200002500045A2F@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:

> usual pattern in our application is
>
> create table xx1 as select ....
> analyze xx1
> create table xx2 as select .... from xx1, ....
> analyze xx2
> create table xx3 as select ... from xx3, ....
> analyze xx3
> create table xx4 as select ... from xx1, ...
>
> tables xx** are use as cache.
>
> so we have to refresh statistic early.
>
> in this situation - and I found so in this case VACUUM ANALYZE is
> faster (30%) than ANALYZE. Size of xx** is usually between 500Kb
> and 8Kb
>
> This is not usual pattern for OLTP - Application is strictly OLAP.

Is the VACUUM ANALYZE step faster, or is the overall job faster if
VACUUM ANALYZE is run? You may be running into the need to rewrite
pages at an inopportune time or order without the VACUUM. Have you
tried getting a time VACUUM FREEZE ANALYZE on these cache tables
instead of plain VACUUM ANALYZE?

-Kevin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 19:12:35
Message-ID: CAFj8pRB7Qd=Ew4CUOx6J4U+ujjTcyv3m24_kneOCEQfe2Ti7mQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/2/22 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>
>> usual pattern in our application is
>>
>> create table xx1 as select ....
>> analyze xx1
>> create table xx2 as select .... from xx1, ....
>> analyze xx2
>> create table xx3 as select ... from xx3, ....
>> analyze xx3
>> create table xx4 as select ... from xx1, ...
>>
>> tables xx** are use as cache.
>>
>> so we have to refresh statistic early.
>>
>> in this situation - and I found so in this case VACUUM ANALYZE is
>> faster (30%) than ANALYZE. Size of xx** is usually between 500Kb
>> and 8Kb
>>
>> This is not usual pattern for OLTP - Application is strictly OLAP.
>
> Is the VACUUM ANALYZE step faster, or is the overall job faster if
> VACUUM ANALYZE is run?  You may be running into the need to rewrite
> pages at an inopportune time or order without the VACUUM.  Have you
> tried getting a time VACUUM FREEZE ANALYZE on these cache tables
> instead of plain VACUUM ANALYZE?
>
> -Kevin

vacuum freeze analyze is slower as expected. vacuum analyze is little
bit faster or same in any step then analyze.

I expected so just analyze should be significantly faster and it is not.

Tom's demonstration is enough for me. ANALYZE doesn't read complete
table, but uses random IO. VACUUM ANALYZE reads complete table, but it
uses seq IO and vacuum is fast (because it does nothing) in our case.

Thank You

Pavel


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 19:23:10
Message-ID: CA+U5nMJLDVMqzrVpvrTBjcEYrsXMyPN2AtS668j9XWpr2K-QYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 22, 2012 at 3:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
>> On Tue, Feb 21, 2012 at 2:00 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> I had to reply to query about usage VACUUM ANALYZE or ANALYZE. I
>>> expected so ANALYZE should be faster then VACUUM ANALYZE.
>
>> VACUUM ANALYZE scans the whole table sequentially.
>
>> ANALYZE accesses a random sample of data blocks. Random access is
>> slower than sequential access, so at some threshold of sample size and
>> sequential/random I/O speed ratio ANALYZE could become slower.
>
> That analysis is entirely wrong.  In the first place, although ANALYZE
> doesn't read all the blocks, what it does read it reads in block number
> order.  So it's not like there are "random" seeks all over the disk that
> would not need to happen anyway.

Entirely right it would seem, since your later comments match my own.

The industry accepted description for non-sequential access is "random
access" whether or not the function that describes the movement is
entirely random. To argue otherwise is merely hairsplitting.

The disk access is not-sequential for ANALYZE. Not-sequential access
is slower on some hardware, and so given a large enough sample it can
account for the observed difference.

Additional access to the disk while the ANALYZE was running would
actually make it fully random, if anyone really cares.

> If the filesystem is hugely biased towards sequential I/O for some
> reason, and the VACUUM scan causes the whole table to become resident in
> RAM where ANALYZE can read it "for free", then I guess it might be
> possible to arrive at Pavel's result.  But it would be an awfully narrow
> corner case.  I cannot believe that his statement is true in general,
> or even for a noticeably large fraction of cases.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-22 22:02:43
Message-ID: CA+TgmoZiRK0jACG4L7md_RF7z7095QyraYt_GUKdpVosHfTeNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> The industry accepted description for non-sequential access is "random
> access" whether or not the function that describes the movement is
> entirely random. To argue otherwise is merely hairsplitting.

I don't think so. For example, a bitmap index scan contrives to speed
things up by arranging for the table I/O to happen in ascending block
number order, with skips, rather than in random order, as a plain
index scan would do, and that seems to be a pretty effective
technique. Except to the extent that it interferes with the kernel's
ability to do readahead, it really can't be to read blocks 1, 2, 3, 4,
and 5 than to read blocks 1, 2, 4, and 5. Not reading block 3 can't
require more effort than reading it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-23 08:34:18
Message-ID: CA+U5nMKUGRz9EJQoig-pJqWcdPE3haZC8iV+3nXK8MvHz5q8mg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 22, 2012 at 10:02 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> The industry accepted description for non-sequential access is "random
>> access" whether or not the function that describes the movement is
>> entirely random. To argue otherwise is merely hairsplitting.
>
> I don't think so.

PostgreSQL already uses a parameter called "random_page_cost" to
describe non-sequential access. Perhaps that is wrong and we need a
third parameter?

> For example, a bitmap index scan contrives to speed
> things up by arranging for the table I/O to happen in ascending block
> number order, with skips, rather than in random order, as a plain
> index scan would do, and that seems to be a pretty effective
> technique.  Except to the extent that it interferes with the kernel's
> ability to do readahead, it really can't be to read blocks 1, 2, 3, 4,
> and 5 than to read blocks 1, 2, 4, and 5.  Not reading block 3 can't
> require more effort than reading it.

By that argument, ANALYZE never could run longer than VACUUM ANALYZE,
so you disagree with Tom and I and you can't explain Pavel's
results....

cost_bitmap_heap_scan() uses "random_page_cost" to evaluate the cost
of accessing blocks, even though the author knew the access was in
ascending block number order. Why was that?

Note that the cost_bitmap_heap_scan() cost can be > than
cost-seqscan() for certain parameter values.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-23 14:44:12
Message-ID: CA+TgmoY=-SmJDBUoVSw+2H==9Ms0Pk4QMt-mub3VFy5A_MreHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 23, 2012 at 3:34 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Wed, Feb 22, 2012 at 10:02 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Feb 22, 2012 at 2:23 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>> The industry accepted description for non-sequential access is "random
>>> access" whether or not the function that describes the movement is
>>> entirely random. To argue otherwise is merely hairsplitting.
>>
>> I don't think so.
>
> PostgreSQL already uses  a parameter called "random_page_cost" to
> describe non-sequential access. Perhaps that is wrong and we need a
> third parameter?
>
>> For example, a bitmap index scan contrives to speed
>> things up by arranging for the table I/O to happen in ascending block
>> number order, with skips, rather than in random order, as a plain
>> index scan would do, and that seems to be a pretty effective
>> technique.  Except to the extent that it interferes with the kernel's
>> ability to do readahead, it really can't be to read blocks 1, 2, 3, 4,
>> and 5 than to read blocks 1, 2, 4, and 5.  Not reading block 3 can't
>> require more effort than reading it.
>
> By that argument, ANALYZE never could run longer than VACUUM ANALYZE,
> so you disagree with Tom and I and you can't explain Pavel's
> results....
>
> cost_bitmap_heap_scan() uses "random_page_cost" to evaluate the cost
> of accessing blocks, even though the author knew the access was in
> ascending block number order. Why was that?
>
> Note that the cost_bitmap_heap_scan() cost can be > than
> cost-seqscan() for certain parameter values.

I think all three of us are saying more or less the same thing in
slightly different words, so I'd rather not have an argument about
this one. But you're right: I can't explain Pavel's results, unless
doing ANALYZE before VACUUM is causing skip-block reads that defeat
the kernel's read-ahead detection. I think it's fairly self-evident
that reading a fixed-size subset of the pages in ascending order can't
*in general* be more expensive than reading all of an arbitrarily
large table, and so I believe we're all in agreement that the behavior
he observed is unusual. As to the cost estimation stuff, we use
random_page_cost as an approximation: there may be a head seek
involved, but to do better we'd have to estimate the likely length of
the seek based on the number of blocks skipped, something we currently
view as irrelevant, and it's not clear that it would improve the
quality of the estimate very much - there are other, probably larger
sources of error, such as the fact that the sequential logical block
number doesn't imply sequential physical position on the platter,
since the OS often fragments the file, especially (I think) on
Windows.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Cédric Villemain <cedric(at)2ndquadrant(dot)fr>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Robert Haas <robertmhaas(at)gmail(dot)com>, Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Subject: Re: VACUUM ANALYZE is faster than ANALYZE?
Date: 2012-02-25 20:03:18
Message-ID: 201202252103.19158.cedric@2ndquadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Le mercredi 22 février 2012 20:12:35, Pavel Stehule a écrit :
> 2012/2/22 Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>:
> > Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> >> usual pattern in our application is
> >>
> >> create table xx1 as select ....
> >> analyze xx1
> >> create table xx2 as select .... from xx1, ....
> >> analyze xx2
> >> create table xx3 as select ... from xx3, ....
> >> analyze xx3
> >> create table xx4 as select ... from xx1, ...
> >>
> >> tables xx** are use as cache.
> >>
> >> so we have to refresh statistic early.
> >>
> >> in this situation - and I found so in this case VACUUM ANALYZE is
> >> faster (30%) than ANALYZE. Size of xx** is usually between 500Kb
> >> and 8Kb
> >>
> >> This is not usual pattern for OLTP - Application is strictly OLAP.
> >
> > Is the VACUUM ANALYZE step faster, or is the overall job faster if
> > VACUUM ANALYZE is run? You may be running into the need to rewrite
> > pages at an inopportune time or order without the VACUUM. Have you
> > tried getting a time VACUUM FREEZE ANALYZE on these cache tables
> > instead of plain VACUUM ANALYZE?
> >
> > -Kevin
>
> vacuum freeze analyze is slower as expected. vacuum analyze is little
> bit faster or same in any step then analyze.
>
> I expected so just analyze should be significantly faster and it is not.
>
> Tom's demonstration is enough for me. ANALYZE doesn't read complete
> table, but uses random IO. VACUUM ANALYZE reads complete table, but it
> uses seq IO and vacuum is fast (because it does nothing) in our case.

VACUUM does read the 1st block to be sure readahead is done when ANALYSE does
not.
For ANALYZE, maybe it is interesting to issue a read on the first block or use
POSIX_FADVISE to (try) to force a readahead of the table when it is small
enough (so ANALYSE can start working while blocks are read and put in cache).

That's being said, I am surprised that the pattern "create table...analyze
create table analyze" of such smalls ones make the data being flush from OS
cache so quickly that they need to be read again from disk.
Pavel, can you check the cache status of the tables just before the analyze ?
(you can use OS tools or pgfincore extension for that)

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation