Quick Links

Re: Vacuum, Freeze and Analyze: the big picture

Lists:	pgsql-hackers

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 18:48:12
Message-ID:	51A79EEC.6070603@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Folks,

There's currently some great ideas bouncing around about eliminating the
overhead associated with FREEZE. However, I wanted to take a step back
and take a look at the big picture for VACUUM, FREEZE and ANALYZE.
Otherwise, we're liable to repeat the 8.4 problem of making one
operation better (background vacuum) while making another one worse
(freezing).

The big, big picture is this:

90% of our users need to think about VACUUM/ANALYZE
at least 10% of the time
and 10% of our users need to think about it
almost 90% of the time.

That's considerably better than was the case 5 years ago, when vacuum
management was a daily or weekly responsibility for nearly 100% of our
users, but it's still not good enough. Our target should be that only
those with really unusual setups should have to *ever* think about
vacuum and analyze.

So I've set up a wiki page to document the various problems that force
users to think about vacuum and analyze and try to troubleshoot it:

https://wiki.postgresql.org/wiki/VacuumHeadaches

We can also collect suggested solutions here. I'm looking to create a
long-term development target which removes most of these vacuum
headaches over the next 3 or 4 releases, without making the unremoved
headaches siginficantly worse.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Thom Brown <thom(at)linux(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 19:01:01
Message-ID:	CAA-aLv7Uxpttr85fNxTvkTDuuW9vwKkLMGTLZ+sHie2PxGP2kQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 30 May 2013 19:48, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Folks,
>
> There's currently some great ideas bouncing around about eliminating the
> overhead associated with FREEZE. However, I wanted to take a step back
> and take a look at the big picture for VACUUM, FREEZE and ANALYZE.
> Otherwise, we're liable to repeat the 8.4 problem of making one
> operation better (background vacuum) while making another one worse
> (freezing).
>
> The big, big picture is this:
>
> 90% of our users need to think about VACUUM/ANALYZE
> at least 10% of the time
> and 10% of our users need to think about it
> almost 90% of the time.
>
> That's considerably better than was the case 5 years ago, when vacuum
> management was a daily or weekly responsibility for nearly 100% of our
> users, but it's still not good enough. Our target should be that only
> those with really unusual setups should have to *ever* think about
> vacuum and analyze.
>
> So I've set up a wiki page to document the various problems that force
> users to think about vacuum and analyze and try to troubleshoot it:
>
> https://wiki.postgresql.org/wiki/VacuumHeadaches
>
> We can also collect suggested solutions here. I'm looking to create a
> long-term development target which removes most of these vacuum
> headaches over the next 3 or 4 releases, without making the unremoved
> headaches siginficantly worse.

Great collection of issues.

I'm not sure I understand this:

"Problem: As of 9.3, there's a significant benefit to vacuum freezing
tables early so that index-only scan is enabled, since freezing also
updates the visibility map. However, with default settings, such
freezing only happens for data which is very old. This means that
index-only scan is less effective than it could be for tables which
have relatively infrequent updates and deletes."

Why specifically VACUUM FREEZE rather than regular VACUUM? I thought
regular VACUUM updated the visibility map too? And why as of 9.3
instead of 9.2?

--
Thom

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 19:08:56
Message-ID:	20130530190856.GG14029@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2013-05-30 11:48:12 -0700, Josh Berkus wrote:
> There's currently some great ideas bouncing around about eliminating the
> overhead associated with FREEZE. However, I wanted to take a step back
> and take a look at the big picture for VACUUM, FREEZE and ANALYZE.
> Otherwise, we're liable to repeat the 8.4 problem of making one
> operation better (background vacuum) while making another one worse
> (freezing).

Inhowfar did 8.4 make freezing worse? I can't remember any new problems
there?

I agree that we need to be careful not to make things worse...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 19:10:01
Message-ID:	51A7A409.7040801@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> "Problem: As of 9.3, there's a significant benefit to vacuum freezing
> tables early so that index-only scan is enabled, since freezing also
> updates the visibility map. However, with default settings, such
> freezing only happens for data which is very old. This means that
> index-only scan is less effective than it could be for tables which
> have relatively infrequent updates and deletes."
>
> Why specifically VACUUM FREEZE rather than regular VACUUM? I thought
> regular VACUUM updated the visibility map too? And why as of 9.3
> instead of 9.2?

As of 9.2, that was a typo.

Allvisible only gets set if there was some reason for VACUUM to visit
the page anyway, no? Which means that an insert-only or insert-mostly
table doesn't get set allvisible until FREEZE. And insert-only tables
are usually very large, and thus really *need* index-only scan.

Hmmm. I should rewrite that item entirely. It has nothing to do with
FREEZE, really.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 19:11:23
Message-ID:	51A7A45B.1070700@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Inhowfar did 8.4 make freezing worse? I can't remember any new problems
> there?

Before the Visibility Map, we always vacuumed all pages in a relation
when it was vacuumed at all. This means that we froze tuples at
vacuum_min_freeze_age, not at autovacuum_max_freeze_age, which is when
we do it post-8.4.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 19:16:13
Message-ID:	20130530191613.GA24645@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-05-30 12:11:23 -0700, Josh Berkus wrote:
>
> > Inhowfar did 8.4 make freezing worse? I can't remember any new problems
> > there?
>
> Before the Visibility Map, we always vacuumed all pages in a relation
> when it was vacuumed at all. This means that we froze tuples at
> vacuum_min_freeze_age, not at autovacuum_max_freeze_age, which is when
> we do it post-8.4.

If we have reason to vacuum the relation we do it at
vacuum_freeze_table_age, not only at autovacuum_max_freeze_age. The
difference is that the latter triggers a vacuum, while the former only
changes a partial vacuum into a full one.

Calling that behaviour unconditionally worse is, err, interesting...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Thom Brown <thom(at)linux(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 19:17:14
Message-ID:	20130530191714.GH14029@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-05-30 20:01:01 +0100, Thom Brown wrote:
> "Problem: As of 9.3, there's a significant benefit to vacuum freezing
> tables early so that index-only scan is enabled, since freezing also
> updates the visibility map. However, with default settings, such
> freezing only happens for data which is very old. This means that
> index-only scan is less effective than it could be for tables which
> have relatively infrequent updates and deletes."

> Why specifically VACUUM FREEZE rather than regular VACUUM? I thought
> regular VACUUM updated the visibility map too?

It does. It's after all what it uses to decide which parts of the table
to scan if not doing a full table vacuum.

> And why as of 9.3 instead of 9.2?

Mabe because 9.3 updates the vm quicker than earlier version by checking
whether all tuples are visible after we've actually removed the dead
tuples.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 19:18:29
Message-ID:	51A7A605.3050700@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> If we have reason to vacuum the relation we do it at
> vacuum_freeze_table_age, not only at autovacuum_max_freeze_age. The
> difference is that the latter triggers a vacuum, while the former only
> changes a partial vacuum into a full one.
>
> Calling that behaviour unconditionally worse is, err, interesting...

*overall* it's better. But as far as FREEZE itself is concerned, it's
worse.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-05-30 20:07:09
Message-ID:	20130530200709.GJ14029@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-05-30 12:18:29 -0700, Josh Berkus wrote:
>
> > If we have reason to vacuum the relation we do it at
> > vacuum_freeze_table_age, not only at autovacuum_max_freeze_age. The
> > difference is that the latter triggers a vacuum, while the former only
> > changes a partial vacuum into a full one.
> >
> > Calling that behaviour unconditionally worse is, err, interesting...
>
> *overall* it's better. But as far as FREEZE itself is concerned, it's
> worse.

I am not trying to give you a hard time, but I really can't follow. In
<8.3 we only froze tuples that were older than vacuum_freeze_min_age,
just as today (although the default was higher then than today). 100mio
transactions is long enough that you almost guaranteedly be in a
different checkpoint cycle when freezing than when initially writing the
tuple's buffer. So independent of the time the buffer is frozen (be it
a) we always scan the whole relation, b) we have a partial vacuum
upgraded to a full one due to vacuum_freeze_table_age c) an anti
wraparound vacuum) we will usually write a buffer multiple times.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-01 17:49:43
Message-ID:	CA+U5nMLwMOfCx04mNbtwk55EFFhCrZMC_KifcEX_Fvaxyo6LWg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 30 May 2013 19:48, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> There's currently some great ideas bouncing around about eliminating the
> overhead associated with FREEZE. However, I wanted to take a step back
> and take a look at the big picture for VACUUM, FREEZE and ANALYZE.

That is a very commendable approach. We should do that more often.

> The big, big picture is this:
>
> 90% of our users need to think about VACUUM/ANALYZE
> at least 10% of the time
> and 10% of our users need to think about it
> almost 90% of the time.

When you say stuff like that, you should add "speculating from my
personal experience". People might get the impression you'd measured
this somehow and it could confuse the issue if you try to assemble a
high level viewpoint and then add in factoids that are just opinions.

We should strive to measure such things.

> That's considerably better than was the case 5 years ago, when vacuum
> management was a daily or weekly responsibility for nearly 100% of our
> users, but it's still not good enough. Our target should be that only
> those with really unusual setups should have to *ever* think about
> vacuum and analyze.

I think that's where we already are given that 1000s of users have
quite small databases.

The problem increases with scale. Larger databases have bigger
problems and make it easier to notice things are happening.

I think you should mention that the evidence for these issues is
anecdotal and take careful notes of the backgrounds in which they
occurred. Saying things occur in all cases wouldn't be accurate or
helpful to their resolution.

We should be seeking to contrast this against other databases to see
if we are better or worse than other systems. For example, recording
the moans of someone who is currently managing a 1 TB database, but
yet hasn't ever managed anything else that big is less valuable than a
balanced, experienced viewpoint (if such exists).

Anyway, I support this approach, just wanted to make sure we do it in
sufficient detail to be useful.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-01 21:56:22
Message-ID:	CA+TgmoZ9tQ32b4R+2wfn7aYppw2rak47QfnnVPH87trKjUpfBQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jun 1, 2013 at 1:49 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 30 May 2013 19:48, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> There's currently some great ideas bouncing around about eliminating the
>> overhead associated with FREEZE. However, I wanted to take a step back
>> and take a look at the big picture for VACUUM, FREEZE and ANALYZE.
>
> That is a very commendable approach. We should do that more often.
>
>> The big, big picture is this:
>>
>> 90% of our users need to think about VACUUM/ANALYZE
>> at least 10% of the time
>> and 10% of our users need to think about it
>> almost 90% of the time.
>
> When you say stuff like that, you should add "speculating from my
> personal experience". People might get the impression you'd measured
> this somehow and it could confuse the issue if you try to assemble a
> high level viewpoint and then add in factoids that are just opinions.
>
> We should strive to measure such things.
>
>> That's considerably better than was the case 5 years ago, when vacuum
>> management was a daily or weekly responsibility for nearly 100% of our
>> users, but it's still not good enough. Our target should be that only
>> those with really unusual setups should have to *ever* think about
>> vacuum and analyze.
>
> I think that's where we already are given that 1000s of users have
> quite small databases.
>
> The problem increases with scale. Larger databases have bigger
> problems and make it easier to notice things are happening.
>
> I think you should mention that the evidence for these issues is
> anecdotal and take careful notes of the backgrounds in which they
> occurred. Saying things occur in all cases wouldn't be accurate or
> helpful to their resolution.
>
> We should be seeking to contrast this against other databases to see
> if we are better or worse than other systems. For example, recording
> the moans of someone who is currently managing a 1 TB database, but
> yet hasn't ever managed anything else that big is less valuable than a
> balanced, experienced viewpoint (if such exists).
>
> Anyway, I support this approach, just wanted to make sure we do it in
> sufficient detail to be useful.

I agree with all that. I don't have any data either, but I agree that
AFAICT it seems to mostly be a problem for large (terabyte-scale)
databases, or ones that are dreadfully short of I/O bandwidth. AWS,
I'm looking at you.

It would be interesting to make a list of what other issues people
have seen using PostgreSQL on very large data sets. Complaints I've
heard include:

1. Inexplicable failure of the planner to use indexes on very large
tables, preferring an obviously-stupid sequential scan. This might be
fixed by the latest index-size fudge factor work.

2. Lack of concurrent DDL.

On VACUUM and ANALYZE specifically, I'd have to say that the most
common problems I encounter are (a) anti-wraparound vacuums kicking in
at inconvenient times and eating up too many resources and (b) users
making ridiculous settings changes to avoid the problems caused by
anti-wraparound vacuums kicking in at inconvenient times and eating up
too many resources. The changes we've been discussing elsewhere may
not completely solve this problem, because we'll still have to read
all pages that aren't yet all-visible... but they should surely help.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-02 18:44:04
Message-ID:	CAMkU=1xCroGg_poyjO307XHZhzPVicGSeuH_CFKZkC+n-vk4pQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Saturday, June 1, 2013, Robert Haas wrote:

>
> I agree with all that. I don't have any data either, but I agree that
> AFAICT it seems to mostly be a problem for large (terabyte-scale)
> databases, or ones that are dreadfully short of I/O bandwidth. AWS,
> I'm looking at you.
>
> It would be interesting to make a list of what other issues people
> have seen using PostgreSQL on very large data sets. Complaints I've
> heard include:
>
> 1. Inexplicable failure of the planner to use indexes on very large
> tables, preferring an obviously-stupid sequential scan. This might be
> fixed by the latest index-size fudge factor work.
>
> 2. Lack of concurrent DDL.
>
> On VACUUM and ANALYZE specifically, I'd have to say that the most
> common problems I encounter are (a) anti-wraparound vacuums kicking in
> at inconvenient times and eating up too many resources and (b) users
> making ridiculous settings changes to avoid the problems caused by
> anti-wraparound vacuums kicking in at inconvenient times and eating up
> too many resources.

Do we know why anti-wraparound uses so many resources in the first place?
The default settings seem to be quite conservative to me, even for a
system that has only a single 5400 rpm hdd (and even more so for any real
production system that would be used for a many-GB database).

I wonder if there is something simple but currently unknown going on which
is causing it to damage performance out of all proportion to the resources
it ought to be using.

Cheers,

Jeff

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-02 22:20:30
Message-ID:	CA+TgmoZ1XZThxegpeEz-7p9ZA+qAzVeHUDocytN8brHDd0v0Dw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Jun 2, 2013 at 2:44 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> Do we know why anti-wraparound uses so many resources in the first place?
> The default settings seem to be quite conservative to me, even for a system
> that has only a single 5400 rpm hdd (and even more so for any real
> production system that would be used for a many-GB database).
>
> I wonder if there is something simple but currently unknown going on which
> is causing it to damage performance out of all proportion to the resources
> it ought to be using.

I can't rule that out. Personally, I've always attributed it to the
fact that it's (a) long and (b) I/O-intensive. But it's not
impossible there could also be bugs lurking.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Ants Aasma <ants(at)cybertec(dot)at>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 08:27:57
Message-ID:	CA+CSw_s-j0BtWJgeaQibjdSU=9+ZkASC2iA9a2HoT87cd09XjQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 1:20 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Jun 2, 2013 at 2:44 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> Do we know why anti-wraparound uses so many resources in the first place?
>> The default settings seem to be quite conservative to me, even for a system
>> that has only a single 5400 rpm hdd (and even more so for any real
>> production system that would be used for a many-GB database).
>>
>> I wonder if there is something simple but currently unknown going on which
>> is causing it to damage performance out of all proportion to the resources
>> it ought to be using.
>
> I can't rule that out. Personally, I've always attributed it to the
> fact that it's (a) long and (b) I/O-intensive. But it's not
> impossible there could also be bugs lurking.

It could be related to the OS. I have no evidence for or against, but
it's possible that OS write-out routines defeat the careful cost based
throttling that PostgreSQL does by periodically dumping a large
portion of dirty pages into the write queue at once. That does nasty
things to query latencies as evidenced by the work on checkpoint
spreading.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 12:03:26
Message-ID:	51AC860E.2090705@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/02/2013 05:56 AM, Robert Haas wrote:
> On Sat, Jun 1, 2013 at 1:49 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 30 May 2013 19:48, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>
>>> There's currently some great ideas bouncing around about eliminating the
>>> overhead associated with FREEZE. However, I wanted to take a step back
>>> and take a look at the big picture for VACUUM, FREEZE and ANALYZE.
>>
>> That is a very commendable approach. We should do that more often.
>>
>>> The big, big picture is this:
>>>
>>> 90% of our users need to think about VACUUM/ANALYZE
>>> at least 10% of the time
>>> and 10% of our users need to think about it
>>> almost 90% of the time.
>>
>> When you say stuff like that, you should add "speculating from my
>> personal experience". People might get the impression you'd measured
>> this somehow and it could confuse the issue if you try to assemble a
>> high level viewpoint and then add in factoids that are just opinions.
>>
>> We should strive to measure such things.
>>
>>> That's considerably better than was the case 5 years ago, when vacuum
>>> management was a daily or weekly responsibility for nearly 100% of our
>>> users, but it's still not good enough. Our target should be that only
>>> those with really unusual setups should have to *ever* think about
>>> vacuum and analyze.
>>
>> I think that's where we already are given that 1000s of users have
>> quite small databases.
>>
>> The problem increases with scale. Larger databases have bigger
>> problems and make it easier to notice things are happening.
>>
>> I think you should mention that the evidence for these issues is
>> anecdotal and take careful notes of the backgrounds in which they
>> occurred. Saying things occur in all cases wouldn't be accurate or
>> helpful to their resolution.
>>
>> We should be seeking to contrast this against other databases to see
>> if we are better or worse than other systems. For example, recording
>> the moans of someone who is currently managing a 1 TB database, but
>> yet hasn't ever managed anything else that big is less valuable than a
>> balanced, experienced viewpoint (if such exists).
>>
>> Anyway, I support this approach, just wanted to make sure we do it in
>> sufficient detail to be useful.
>
> I agree with all that. I don't have any data either, but I agree that
> AFAICT it seems to mostly be a problem for large (terabyte-scale)
> databases, or ones that are dreadfully short of I/O bandwidth. AWS,
> I'm looking at you.
>
> It would be interesting to make a list of what other issues people
> have seen using PostgreSQL on very large data sets. Complaints I've
> heard include:
>
> 1. Inexplicable failure of the planner to use indexes on very large
> tables, preferring an obviously-stupid sequential scan. This might be
> fixed by the latest index-size fudge factor work.

I've seen cases on Stack Overflow and elsewhere in which disk merge
sorts perform vastly better than in-memory quicksort, so the user
benefited from greatly *lowering* work_mem.

> (b) users
> making ridiculous settings changes to avoid the problems caused by
> anti-wraparound vacuums kicking in at inconvenient times and eating up
> too many resources.

Some recent experiences I've had have also bought home to me that vacuum
problems are often of the user's own making.

"My database is slow"
->
"This autovacuum thing is using up lots of I/O and CPU, I'll increase
this delay setting here"
->
"My database is slower"
->
"Maybe I didn't solve the autovacuum thing, I'll just turn it off"
->
"My database is barely working"
->
"I'll whack in some manual VACUUM cron jobs during low load maintenance
hours and hope that keeps the worst of the problem away, that's what
random forum posts on the Internet say to do".
-> "oh my, why did my DB just do an emergency shutdown?"

Vacuum being more able to operate in a feedback loop driven by bloat
statistics might be quite valuable, but I'm also wondering if there's
any remotely feasible way to more usefully alert users when they're
having table bloat issues and vacuum isn't coping. Particularly for
cases where autovacuum is working but being impaired by locking.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 13:34:58
Message-ID:	1370266498.2693.YahooMailNeo@web162905.mail.bf1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 06/02/2013 05:56 AM, Robert Haas wrote:

>> I agree with all that. I don't have any data either, but I agree that
>> AFAICT it seems to mostly be a problem for large (terabyte-scale)
>> databases, or ones that are dreadfully short of I/O bandwidth. AWS,
>> I'm looking at you.

> I've seen cases on Stack Overflow and elsewhere in which disk merge
> sorts perform vastly better than in-memory quicksort, so the user
> benefited from greatly *lowering* work_mem.

I have seen this a few times, to. It would be interesting to
characterize the conditions under which this is the case.

>> (b) users
>> making ridiculous settings changes to avoid the problems caused by
>> anti-wraparound vacuums kicking in at inconvenient times and eating up
>> too many resources.

Where I hit a nightmare scenario with an anti-wraparound
autovacuum, personally, was after an upgrade using pg_dump piped to
psql. At a high OLTP transaction load time (obviously the most
likely time for it to kick in, because it is triggered by xid
consumption), it started to READ AND REWRITE every heap page of
every table. This overwhelmed the battery-backed write cache,
causing a series of "freezes" for a few minutes at a time, raising
a very large number of end-user complaints. This is when I started
insisting on a VACUUM FREEZE ANALYZE after any bulk load before it
was considered complete and the database brought online for
production use.

> Some recent experiences I've had have also bought home to me that vacuum
> problems are often of the user's own making.
>
> "My database is slow"
> ->
> "This autovacuum thing is using up lots of I/O and CPU, I'll increase
> this delay setting here"
> ->
> "My database is slower"
> ->
> "Maybe I didn't solve the autovacuum thing, I'll just turn it
> off"
> ->
> "My database is barely working"
> ->
> "I'll whack in some manual VACUUM cron jobs during low load maintenance
> hours and hope that keeps the worst of the problem away, that's what
> random forum posts on the Internet say to do".
> -> "oh my, why did my DB just do an emergency shutdown?"

Yeah, I've seen exactly that sequence, and some variations on it
quite often. In fact, when I was first using PostgreSQL I got as
far as "Maybe I didn't solve the autovacuum thing" but instead of
"I'll just turn it off" my next step was "I wonder what would
happen if I tried making it *more* aggressive so that it didn't
have so much work to do each time it fired?" Of course, that
vastly improved things. I have found it surprisingly difficult to
convince other people to try that, though.

I have seen people so convinced that vacuum (and particularly
autovacuum) are *so* evil that they turn off autovacuum and monitor
the freeze status of their tables and databases so that they can
run VACUUM "just in time" to prevent the emergency shutdown.
Obviously, this isn't great for their performance. :-(

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 13:42:48
Message-ID:	20130603134248.GJ3955@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-02 11:44:04 -0700, Jeff Janes wrote:
> Do we know why anti-wraparound uses so many resources in the first place?
> The default settings seem to be quite conservative to me, even for a
> system that has only a single 5400 rpm hdd (and even more so for any real
> production system that would be used for a many-GB database).

I guess the point is that nobody can actually run a bigger OLTP database
successfully with the default settings. Usually that will end up with a)
huge amounts of bloat in the tables autovac doesn't scan first b) forced
shutdowns because autovac doesn't freeze quickly enough.

The default suggestion that frequently seems to be made is just to
disable autovac cost limitations because of that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 17:15:55
Message-ID:	51ACCF4B.50208@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I agree with all that. I don't have any data either, but I agree that
> AFAICT it seems to mostly be a problem for large (terabyte-scale)
> databases, or ones that are dreadfully short of I/O bandwidth. AWS,
> I'm looking at you.

Well, at this point, numerically I'd bet that more than 50% of our users
are on AWS, some other cloud, or some kind of iSCSI storage ... some
place where IO sucks. It's How Things Are Done Now.

Speaking for my own clientele, people run into issues, or think they
have issues, with autovacuum at databases as small as 100GB, as long as
they have sufficient write throughput. One really pathological case I
had to troubleshoot was a database which was only 200MB in size! (this
database contained counts of things, and was updated 10,000 times per
second).

Anyway, my goal with that wiki page -- which is on the wiki so that
others can add to it -- is to get all of the common chronic issues on
the table so that we don't inadvertently make one problem worse while
making another one better. Some of the solutions to FREEZE being
bandied around seemed likely to do that.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 17:28:07
Message-ID:	51ACD227.7020806@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff,

> Do we know why anti-wraparound uses so many resources in the first place?
> The default settings seem to be quite conservative to me, even for a
> system that has only a single 5400 rpm hdd (and even more so for any real
> production system that would be used for a many-GB database).
>
> I wonder if there is something simple but currently unknown going on which
> is causing it to damage performance out of all proportion to the resources
> it ought to be using.

Does anti-wraparound vacuum (AWAV) write synchronously? If so, there's
a potential whole world of hurt there.

Otherwise, the effect you're seeing is just blowing out various caches:
the CPU cache, storage cache, and filesystem cache. While we can (and
do) prevent vacuum from blowing out shared_buffers, we can't do much
about the others.

Also, locking while it does its work.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 17:46:26
Message-ID:	CA+TgmobGXcjWGwMpr434rRTK8F_H3rF8wrNJEU7dJZd1Us_ntg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 1:28 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Does anti-wraparound vacuum (AWAV) write synchronously? If so, there's
> a potential whole world of hurt there.

Not any moreso than anything else ... although it probably does a very
high percentage of FPIs, which might lead to lots of checkpointing.

> Also, locking while it does its work.

Eh?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 18:00:38
Message-ID:	51ACD9C6.1090203@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> Also, locking while it does its work.
>
> Eh?

Even if we're doing lazy vacuum, we have to lock a few pages at a time
of each table. This does result in response time delays on the current
workload, which can be quite bad if it's a highly contended table already.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 18:12:00
Message-ID:	20130603181200.GA12323@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-03 11:00:38 -0700, Josh Berkus wrote:
>
> >> Also, locking while it does its work.
> >
> > Eh?
>
> Even if we're doing lazy vacuum, we have to lock a few pages at a time
> of each table. This does result in response time delays on the current
> workload, which can be quite bad if it's a highly contended table already.

We don't really lock more pages at a time than normal DML does. 1 heap
page at a time, possibly several index pages at once.

There's something related which can cause problems which is that we
require cleanup locks on the page to be able to repair fragmentation
which makes *vacuum* wait for all clients to release their page pins.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 18:56:35
Message-ID:	51ACE6E3.4000309@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/03/2013 11:12 AM, Andres Freund wrote:
> On 2013-06-03 11:00:38 -0700, Josh Berkus wrote:
>>
>>>> Also, locking while it does its work.
>>>
>>> Eh?
>>
>> Even if we're doing lazy vacuum, we have to lock a few pages at a time
>> of each table. This does result in response time delays on the current
>> workload, which can be quite bad if it's a highly contended table already.
>
> We don't really lock more pages at a time than normal DML does. 1 heap
> page at a time, possibly several index pages at once.

Really? I though vacuum held onto its locks until it reached
vacuum_cost. If it doesn't, then maybe we should adjust the default for
vacuum_cost_limit upwards.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Ants Aasma <ants(at)cybertec(dot)at>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 19:48:26
Message-ID:	20130603194826.GA15888@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 03, 2013 at 11:27:57AM +0300, Ants Aasma wrote:
> > I can't rule that out. Personally, I've always attributed it to the
> > fact that it's (a) long and (b) I/O-intensive. But it's not
> > impossible there could also be bugs lurking.
>
> It could be related to the OS. I have no evidence for or against, but
> it's possible that OS write-out routines defeat the careful cost based
> throttling that PostgreSQL does by periodically dumping a large
> portion of dirty pages into the write queue at once. That does nasty
> things to query latencies as evidenced by the work on checkpoint
> spreading.

In other contexts I've run into issues relating to large continuous
writes stalling. The issue is basically that the Linux kernel allows
(by default) writes to pile up until they reach 5% of physical memory
before deciding that the sucker who wrote the last block becomes
responsible for writing the whole lot out. At full speed of course.
Depending on the amount of memory and the I/O speed of your disks this
could take a while, and interfere with other processes.

This leads to extremely bursty I/O behaviour.

The solution, as usual, is to make it more aggressive, so the
kernel background writer triggers at 1% memory.

I'm not saying that's the problem here, but it is an example of a
situation where the write queue can become very large very quickly.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 21:26:08
Message-ID:	CA+TgmoZ4O76kp=oD_446KzZ4RfSTRik_pp9wJY=VLGJDchHTfA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 2:56 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Really? I though vacuum held onto its locks until it reached
> vacuum_cost. If it doesn't, then maybe we should adjust the default for
> vacuum_cost_limit upwards.

That would be completely insane.

Or in other words, no, it doesn't do anything like that. :-)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 21:27:01
Message-ID:	CAM3SWZSJPgewee_dTicq3PuZgrY+63SbbzATHTr+rdg4ykOp-w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 5:03 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> I've seen cases on Stack Overflow and elsewhere in which disk merge
> sorts perform vastly better than in-memory quicksort, so the user
> benefited from greatly *lowering* work_mem.

I've heard of that happening on Oracle, when the external sort is
capable of taking advantage of I/O parallelism, but I have a pretty
hard time believing that it could happen with Postgres under any
circumstances. Maybe if someone was extraordinarily unlucky and
happened to hit quicksort's O(n ^ 2) worst case it could happen, but
we take various measures that make that very unlikely. It might also
have something to do with our "check for pre-sorted input" [1], but
I'm still skeptical.

[1] http://www.postgresql.org/message-id/CAEYLb_Xn4-6f1ofsf2qduf24dDCVHbQidt7JPpdL_RiT1zBJ6A@mail.gmail.com

--
Peter Geoghegan

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Ants Aasma <ants(at)cybertec(dot)at>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 21:35:33
Message-ID:	CA+TgmoYCd8Gve126F_vppPi-eEz6Wo+WGKB6=k1S9AGyabCxWg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 3:48 PM, Martijn van Oosterhout
<kleptog(at)svana(dot)org> wrote:
> On Mon, Jun 03, 2013 at 11:27:57AM +0300, Ants Aasma wrote:
>> > I can't rule that out. Personally, I've always attributed it to the
>> > fact that it's (a) long and (b) I/O-intensive. But it's not
>> > impossible there could also be bugs lurking.
>>
>> It could be related to the OS. I have no evidence for or against, but
>> it's possible that OS write-out routines defeat the careful cost based
>> throttling that PostgreSQL does by periodically dumping a large
>> portion of dirty pages into the write queue at once. That does nasty
>> things to query latencies as evidenced by the work on checkpoint
>> spreading.
>
> In other contexts I've run into issues relating to large continuous
> writes stalling. The issue is basically that the Linux kernel allows
> (by default) writes to pile up until they reach 5% of physical memory
> before deciding that the sucker who wrote the last block becomes
> responsible for writing the whole lot out. At full speed of course.
> Depending on the amount of memory and the I/O speed of your disks this
> could take a while, and interfere with other processes.
>
> This leads to extremely bursty I/O behaviour.
>
> The solution, as usual, is to make it more aggressive, so the
> kernel background writer triggers at 1% memory.
>
> I'm not saying that's the problem here, but it is an example of a
> situation where the write queue can become very large very quickly.

Yeah. IMHO, the Linux kernel's behavior around the write queue is
flagrantly insane. The threshold for background writing really seems
like it ought to be zero. I can see why it makes sense to postpone
writing back dirty data if we're otherwise starved for I/O. But it
seems like the kernel is disposed to cache large amounts of dirty data
for an unbounded period of time even if the I/O system is completely
idle, and it's difficult to imagine what class of user would find that
behavior desirable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Ants Aasma <ants(at)cybertec(dot)at>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 22:16:49
Message-ID:	1370297809.95780.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
>> On Mon, Jun 03, 2013 at 11:27:57AM +0300, Ants Aasma wrote:

>>> It could be related to the OS. I have no evidence for or against, but
>>> it's possible that OS write-out routines defeat the careful cost based
>>> throttling that PostgreSQL does by periodically dumping a large
>>> portion of dirty pages into the write queue at once. That does nasty
>>> things to query latencies as evidenced by the work on checkpoint
>>> spreading.
>>
>> In other contexts I've run into issues relating to large continuous
>> writes stalling. The issue is basically that the Linux kernel allows
>> (by default) writes to pile up until they reach 5% of physical memory
>> before deciding that the sucker who wrote the last block becomes
>> responsible for writing the whole lot out. At full speed of course.
>> Depending on the amount of memory and the I/O speed of your disks this
>> could take a while, and interfere with other processes.
>>
>> This leads to extremely bursty I/O behaviour.
>>
>> The solution, as usual, is to make it more aggressive, so the
>> kernel background writer triggers at 1% memory.
>>
>> I'm not saying that's the problem here, but it is an example of a
>> situation where the write queue can become very large very quickly.
>
> Yeah. IMHO, the Linux kernel's behavior around the write queue is
> flagrantly insane. The threshold for background writing really seems
> like it ought to be zero. I can see why it makes sense to postpone
> writing back dirty data if we're otherwise starved for I/O.

I imagine the reason the OS guys would give for holding up on disk
writes for as long as possible would sound an awful lot like the
reason PostgreSQL developers give for doing it. Keep in mind that
the OS doesn't know whether there might or might not be another
layer of caching (on a battery-backed RAID controller or SSD).
It's trying to minimize disk writes by waiting, to improve
throughput by collapsing duplicate writes and allowing the writes
to be performed in a more efficient order based on physical layout.

> But it seems like the kernel is disposed to cache large amounts
> of dirty data for an unbounded period of time even if the I/O
> system is completely idle,

It's not unbounded time. Last I heard, the default was 30 seconds.

> and it's difficult to imagine what class of user would find that
> behavior desirable.

Well, certainly not a user of a database that keeps dirty pages
lingering for five minutes by default, and often increases that to
minimize full page writes. IMO, our defaults for bgwriter are far
too passive.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Ants Aasma <ants(at)cybertec(dot)at>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 23:41:32
Message-ID:	CAM3SWZRzVB0gTY1LNh4L+MBHythC_ez5isF6ARmnD7jfJHJX_A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 3:16 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
>> But it seems like the kernel is disposed to cache large amounts
>> of dirty data for an unbounded period of time even if the I/O
>> system is completely idle,
>
> It's not unbounded time. Last I heard, the default was 30 seconds.

I'm pretty sure it is unbounded. The VM documentation is a bit vague
on what dirty_expire_centisecs actually means, which is I presume
where this number comes from. It says:

"This tunable is used to define when dirty data is old enough to be eligible
for writeout by the pdflush daemons. It is expressed in 100'ths of a second.
Data which has been dirty in-memory for longer than this interval will be
written out next time a pdflush daemon wakes up."

So I think the a pdflush daemon won't necessarily wake up until
dirty_background_bytes or dirty_background_ratio have been exceeded,
regardless of this threshold. Am I mistaken?

https://www.kernel.org/doc/Documentation/sysctl/vm.txt
--
Peter Geoghegan

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-03 23:45:19
Message-ID:	51AD2A8F.6080302@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/04/2013 05:27 AM, Peter Geoghegan wrote:
> On Mon, Jun 3, 2013 at 5:03 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> I've seen cases on Stack Overflow and elsewhere in which disk merge
>> sorts perform vastly better than in-memory quicksort, so the user
>> benefited from greatly *lowering* work_mem.
> I've heard of that happening on Oracle, when the external sort is
> capable of taking advantage of I/O parallelism, but I have a pretty
> hard time believing that it could happen with Postgres under any
> circumstances.
IIRC it's usually occurred with very expensive comparison operations.

I'll see if I can find one of the SO cases.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Kevin Grittner <kgrittn(at)ymail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Ants Aasma <ants(at)cybertec(dot)at>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-04 00:04:59
Message-ID:	20130604000459.GB12323@alap2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-03 16:41:32 -0700, Peter Geoghegan wrote:
> On Mon, Jun 3, 2013 at 3:16 PM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:
> >> But it seems like the kernel is disposed to cache large amounts
> >> of dirty data for an unbounded period of time even if the I/O
> >> system is completely idle,
> >
> > It's not unbounded time. Last I heard, the default was 30 seconds.
>
> I'm pretty sure it is unbounded. The VM documentation is a bit vague
> on what dirty_expire_centisecs actually means, which is I presume
> where this number comes from. It says:
>
> "This tunable is used to define when dirty data is old enough to be eligible
> for writeout by the pdflush daemons. It is expressed in 100'ths of a second.
> Data which has been dirty in-memory for longer than this interval will be
> written out next time a pdflush daemon wakes up."
>
> So I think the a pdflush daemon won't necessarily wake up until
> dirty_background_bytes or dirty_background_ratio have been exceeded,
> regardless of this threshold. Am I mistaken?

Without having it checked again, afair it should wakeup every
dirty_writeback_centisecs which is something like 5seconds.

All that has pretty significantly changed - and imo improved! - in the
last year or so of kernel development. Unfortunately it will take a
while till we commonly see those kernels being used :(

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Peter Geoghegan <pg(at)heroku(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Ants Aasma <ants(at)cybertec(dot)at>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-04 00:19:24
Message-ID:	51AD328C.9090601@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> All that has pretty significantly changed - and imo improved! - in the
> last year or so of kernel development. Unfortunately it will take a
> while till we commonly see those kernels being used :(

... after being completely broken for 3.2 through 3.5.

We're actually using 3.9 in production on some machines, because we
couldn't take the IO disaster that is 3.4.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-05 14:40:11
Message-ID:	CAM-w4HPkna59h_8fp5Y03e1-A-csRJuUOtSeMLWQmmt7wJqFtg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 30, 2013 at 7:48 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> The big, big picture is this:
>
> 90% of our users need to think about VACUUM/ANALYZE
> at least 10% of the time
> and 10% of our users need to think about it
> almost 90% of the time.
>
> That's considerably better than was the case 5 years ago, when vacuum
> management was a daily or weekly responsibility for nearly 100% of our
> users,

Fwiw I think this is not the right picture. I think the current
situation an accurate description of the way things are and have
always been.

It's an arms race. We've raised the bar of how large and busy your
database has to be before vacuum becomes a pain and users scale their
databases up. As long as we stay one step ahead of the users 90% of
users won't have to think about vacuum/analyze much. There will always
be outliers.

When the visibility map went in the argument was that wraparound was
so rare that it wasn't worth doubling the size of the visibility map
to have a second bit. If the table gets even a low amount of traffic
nearly all blocks will need to be frozen anyways by that time. To do
something like the visibility map for freezing we would need something
like a map that stores the high 8 bits of the oldest unfrozen xid in
the block. That be a lot more complex and take a lot more space.

--
greg

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-06 20:38:08
Message-ID:	CAMkU=1y0irGcbNRaxtUBJdNPHRt9AMorwgKM-DzAPZiwKTLwSw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 5:03 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 06/02/2013 05:56 AM, Robert Haas wrote:
>

> > (b) users
> > making ridiculous settings changes to avoid the problems caused by
> > anti-wraparound vacuums kicking in at inconvenient times and eating up
> > too many resources.
>
> Some recent experiences I've had have also bought home to me that vacuum
> problems are often of the user's own making.
>
> "My database is slow"
> ->
> "This autovacuum thing is using up lots of I/O and CPU, I'll increase
> this delay setting here"
>

Do you think this was the correct diagnosis but with the wrong action
taken, or was the diagnosis incorrect in the first place (i.e. it may be
using some IO and CPU, but that isn't what was causing the initial
problem)? And if the diagnosis was correct, was it causing problems under
default settings, or only because they already turned off the cost delay?

Cheers,

Jeff

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-06 20:56:54
Message-ID:	CAMkU=1zPpgp7KKmW8_0aOgdS+WFy7Ypses6WNzqz-KZE=AY1UA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 6:34 AM, Kevin Grittner <kgrittn(at)ymail(dot)com> wrote:

>
>
> Where I hit a nightmare scenario with an anti-wraparound
> autovacuum, personally, was after an upgrade using pg_dump piped to
> psql. At a high OLTP transaction load time (obviously the most
> likely time for it to kick in, because it is triggered by xid
> consumption), it started to READ AND REWRITE every heap page of
> every table. This overwhelmed the battery-backed write cache,
> causing a series of "freezes" for a few minutes at a time, raising
> a very large number of end-user complaints.

But this is only after autovacuum_vacuum_cost_delay was already changed to
zero, right? It is hard to imagine the write cache being overwhelmed by
the default setting, or even substantially more aggressive than the default
but still not zero. Anti-wraparound vacuums should generate almost purely
sequential writes (at least if only btree indexes exist), so they should
clear very quickly.

> > "I'll whack in some manual VACUUM cron jobs during low load maintenance
> > hours and hope that keeps the worst of the problem away, that's what
> > random forum posts on the Internet say to do".
> > -> "oh my, why did my DB just do an emergency shutdown?"
>
> Yeah, I've seen exactly that sequence, and some variations on it
> quite often. In fact, when I was first using PostgreSQL I got as
> far as "Maybe I didn't solve the autovacuum thing" but instead of
> "I'll just turn it off" my next step was "I wonder what would
> happen if I tried making it *more* aggressive so that it didn't
> have so much work to do each time it fired?" Of course, that
> vastly improved things. I have found it surprisingly difficult to
> convince other people to try that, though.
>

What is it you changed? Either a anti-wraparound happens, or it does not,
so I'm not sure what you mean about making it more aggressive so there is
less to do. It always has to do the whole thing. Was it the
autovacuum_vacuum_scale_factor that you changed?

Cheers,

Jeff

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-06 21:30:46
Message-ID:	CAMkU=1w5U27WSyXN9NLJs5j6FyLQGZnRV0xL6FFAnaOxsrTx9w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 5:03 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> ->
> "I'll whack in some manual VACUUM cron jobs during low load maintenance
> hours and hope that keeps the worst of the problem away, that's what
> random forum posts on the Internet say to do".
> -> "oh my, why did my DB just do an emergency shutdown?"
>

This one doesn't make much sense to me, unless they mucked around with
autovacuum_freeze_max_age as well as turning autovacuum itself off (common
practice?). With the default setting of autovacuum_freeze_max_age, if it
can't complete the anti-wraparound before emergency shutdown with autovac
off, it probably would not have completed it with autovac on, either.

Cheers,

Jeff

From:	Craig Ringer <craig(at)2ndquadrant(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-09 10:20:14
Message-ID:	51B456DE.9080805@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06/07/2013 04:38 AM, Jeff Janes wrote:
> On Mon, Jun 3, 2013 at 5:03 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>> "My database is slow"
>> ->
>> "This autovacuum thing is using up lots of I/O and CPU, I'll increase
>> this delay setting here"
>
> Do you think this was the correct diagnosis but with the wrong action
> taken, or was the diagnosis incorrect in the first place (i.e. it may be
> using some IO and CPU, but that isn't what was causing the initial
> problem)? And if the diagnosis was correct, was it causing problems under
> default settings, or only because they already turned off the cost delay?

The problem is that vacuum running too slow tends to result in table and
index bloat. Which results in less efficient cache use, slower scans,
and generally worsening performance.

I've repeatedly seen the user attribute the resulting high I/O to
autovacuum (which is, after all, always working away trying to keep up)
- and "solving" the problem by further slowing autovacuum.

It is very counter-intuitive that to fix the problem the user needs to
make the background process that's doing the I/O take up *more*
resources, so that other queries take *even less*.

>> ->
>> "I'll whack in some manual VACUUM cron jobs during low load maintenance
>> hours and hope that keeps the worst of the problem away, that's what
>> random forum posts on the Internet say to do".
>> -> "oh my, why did my DB just do an emergency shutdown?"
>
> This one doesn't make much sense to me, unless they mucked around with
> autovacuum_freeze_max_age as well as turning autovacuum itself off
> (common practice?).

Unfortunately, yes, as an extension of the above reasoning people seem
to apply around autovacuum. The now horrifyingly bloated DB is being
kept vaguely functional by regular cron'd vacuum runs, but then
autovacuum kicks back in and starts thrashing the system. It's already
performing really badly because of all the bloat so this is more than it
can take and performance tanks critically. Particularly since it
probably has 1000 or more backends thrashing away if it's anything like
many of the systems I've been seeing in the wild.

The operator's response: Panic and find out how to make it stop. Once
autovacuum quits doing its thing the system returns to staggering along
and they go back to planning a hardware upgrade someday, then suddenly
it's emergency wraparound prevention time.

I suspect vacuum, autovacuum, autovacuum tuning, table and index bloat,
etc is just too complicated for a lot of people running Pg installs to
really understand. I'd really, really love to see some feedback-based
auto-tuning of vacuum.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-09 14:44:36
Message-ID:	1370789076.76307.YahooMailNeo@web162902.mail.bf1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 06/07/2013 04:38 AM, Jeff Janes wrote:
>> Craig Ringer <craig(at)2ndquadrant(dot)com>

> The problem is that vacuum running too slow tends to result in
> table and index bloat. Which results in less efficient cache use,
> slower scans, and generally worsening performance.
>
> I've repeatedly seen the user attribute the resulting high I/O to
> autovacuum (which is, after all, always working away trying to
> keep up) - and "solving" the problem by further slowing
> autovacuum.
>
> It is very counter-intuitive that to fix the problem the user
> needs to make the background process that's doing the I/O take up
> *more* resources, so that other queries take *even less*.

Exactly. It can be very hard to convince someone to make
autovacuum more aggressive when they associate its default
configuration with slowness.

>>> -> "I'll whack in some manual VACUUM cron jobs during low load
>>> maintenance hours and hope that keeps the worst of the problem
>>> away, that's what random forum posts on the Internet say to
>>> do".
>>> -> "oh my, why did my DB just do an emergency shutdown?"
>>
>> This one doesn't make much sense to me, unless they mucked
>> around with autovacuum_freeze_max_age as well as turning
>> autovacuum itself off (common practice?).
>
> Unfortunately, yes, as an extension of the above reasoning people
> seem to apply around autovacuum. The now horrifyingly bloated DB
> is being kept vaguely functional by regular cron'd vacuum runs,
> but then autovacuum kicks back in and starts thrashing the
> system. It's already performing really badly because of all the
> bloat so this is more than it can take and performance tanks
> critically. Particularly since it probably has 1000 or more
> backends thrashing away if it's anything like many of the systems
> I've been seeing in the wild.
>
> The operator's response: Panic and find out how to make it stop.
> Once autovacuum quits doing its thing the system returns to
> staggering along and they go back to planning a hardware upgrade
> someday, then suddenly it's emergency wraparound prevention time.

I have seen exactly this pattern multiple times. They sometimes
completely ignore all advice about turning on and tuning autovacuum
and instead want to know the exact formula for when the the
wraparound prevention autovacuum will trigger, so they can run a
vacuum "just in time" to prevent it -- since they believe this will
minimize disk access and thus give them best performance. They
often take this opportunity to run VACUUM FULL on the table and
don't see the point of following that with any other form of
VACUUM, so they wipe out their visibility map in the process.

> I suspect vacuum, autovacuum, autovacuum tuning, table and index
> bloat, etc is just too complicated for a lot of people running Pg
> installs to really understand.

The ones who suffer most are those who learn just enough to think
they know how to tune better than the defaults, but not enough to
really understand the full impact of the changes they are making.
I have no particular ideas on what to do about that observation,
unfortunately.

> I'd really, really love to see some feedback-based auto-tuning of
> vacuum.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-10 18:34:23
Message-ID:	51B61C2F.7060303@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> I suspect vacuum, autovacuum, autovacuum tuning, table and index bloat,
> etc is just too complicated for a lot of people running Pg installs to
> really understand. I'd really, really love to see some feedback-based
> auto-tuning of vacuum.

Heck, it's hard for *me* to understand, and I helped design it. I think
there's no question that it could be vastly improved.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-12 21:43:53
Message-ID:	CAMkU=1w-LAcDq9DEysGvNZPJ1F1h78Ujq6ff7qFLD9sE3dzPPw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 3, 2013 at 6:42 AM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-06-02 11:44:04 -0700, Jeff Janes wrote:
> > Do we know why anti-wraparound uses so many resources in the first place?
> > The default settings seem to be quite conservative to me, even for a
> > system that has only a single 5400 rpm hdd (and even more so for any real
> > production system that would be used for a many-GB database).
>
> I guess the point is that nobody can actually run a bigger OLTP database
> successfully with the default settings. Usually that will end up with a)
> huge amounts of bloat in the tables autovac doesn't scan first b) forced
> shutdowns because autovac doesn't freeze quickly enough.
>

I think that Greg Smith posted elsewhere that 4MB/sec of dirtying (which is
the default) was about right for some of his very busy systems, which seem
like they had pretty impressive IO subsystems. I was surprised it was so
low. Are there other anecdotes about what settings work well in practise,
assuming people ever find ones that work well?

Which raises the question, Is the primary problem that there are no
settings that work well for very those systems, or that there usually are
such sweet-spot settings but mere mortals cannot find them?

>
> The default suggestion that frequently seems to be made is just to
> disable autovac cost limitations because of that.
>

Is there general agreement that this suggestion is bad? Setting
autovacuum_vacuum_cost_delay to zero is basically saying "I dare you to do
your best to destroy my IO performance." So it is not surprising that this
just moves one from the frying pan to the fire, or maybe the reverse. (The
small ring buffer used by vacuums might save your bacon if your fsyncs
actually need to hit disk, as the constant stream of fsync requests to the
WAL will act as a secondary throttle).

How about recommending that if autovacuum is not keeping up, that it be
tried to set it to the default divided by the number of spindles? That may
be overly aggressive, but infinitely less aggressive than setting it to
zero would be.

Cheers,

Jeff

From:	Andres Freund <andres(at)2ndquadrant(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-12 21:59:42
Message-ID:	20130612215942.GA27432@awork2.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2013-06-12 14:43:53 -0700, Jeff Janes wrote:
> > The default suggestion that frequently seems to be made is just to
> > disable autovac cost limitations because of that.

> Is there general agreement that this suggestion is bad? Setting
> autovacuum_vacuum_cost_delay to zero is basically saying "I dare you to do
> your best to destroy my IO performance." So it is not surprising that this
> just moves one from the frying pan to the fire, or maybe the reverse.

It sure as heck is better than an anti wraparound shutdown every week
because autovacuum doesn't finish all relations in time. More often than
not a large part of the relations has already been frozen so it won't
dirty all that much.
I think it's actually a quite sensible setting in many situations given
the state of the current controls.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Kevin Grittner <kgrittn(at)ymail(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-12 23:31:16
Message-ID:	1371079876.48038.YahooMailNeo@web162906.mail.bf1.yahoo.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> Are there other anecdotes about what settings work well in
> practise, assuming people ever find ones that work well?

Putting WAL on its own RAID on its own battery-backed cached can
help a lot more than I would have thought -- even with read-only
transactions.

http://www.postgresql.org/message-id/4B71358E020000250002F0E4@gw.wicourts.gov

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc:	Peter Geoghegan <pg(at)heroku(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-19 01:11:42
Message-ID:	4489D2DF-85AC-4BC7-931E-BA15D6CB0CF3@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jun 3, 2013, at 6:45 PM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
> On 06/04/2013 05:27 AM, Peter Geoghegan wrote:
>> On Mon, Jun 3, 2013 at 5:03 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>>> I've seen cases on Stack Overflow and elsewhere in which disk merge
>>> sorts perform vastly better than in-memory quicksort, so the user
>>> benefited from greatly *lowering* work_mem.
>> I've heard of that happening on Oracle, when the external sort is
>> capable of taking advantage of I/O parallelism, but I have a pretty
>> hard time believing that it could happen with Postgres under any
>> circumstances.
> IIRC it's usually occurred with very expensive comparison operations.
>
> I'll see if I can find one of the SO cases.

FWIW, I've definitely seen this behavior in the past, on really old versions (certainly pre-9, possibly pre-8).

IIRC there's some kind of compression or something used with on-disk sorts. If that's correct then I think what's happening is that the "on-disk" sort that fits into cache is actually using less memory than quicksort. Or perhaps it was just a matter of memory locality within each tape. It's been too long since I looked at it. :(

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Craig Ringer <craig(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Vacuum, Freeze and Analyze: the big picture
Date:	2013-06-19 03:21:24
Message-ID:	CAM3SWZQmHv3jHq-_nUuGUo3gmjfrBS3sRYRzR7e=ZkfMU=jwrA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 18, 2013 at 6:11 PM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> IIRC there's some kind of compression or something used with on-disk sorts.

I think you're mistaken.

> If that's correct then I think what's happening is that the "on-disk" sort that fits into cache
> is actually using less memory than quicksort. Or perhaps it was just a matter of memory
> locality within each tape. It's been too long since I looked at it. :(

External sorts do of course use less memory, but quicksort is
particularly good at taking advantage of memory locality.

I think it's possible that what you recall is the days when we used
the OS qsort(), and we were at the mercy of the implementation that
the OS provided. When we effectively began to vendor our own sort
routine in 2006, we chose a high-quality one with various protections
against quadratic behaviors. Implementations that lacked these
protections were prevalent at a surprisingly late stage.

--
Peter Geoghegan