Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when

Lists: pgsql-committerspgsql-hackers
From: alvherre(at)postgresql(dot)org (Alvaro Herrera)
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Reduce the size of memory allocations by lazy vacuum when
Date: 2007-09-24 03:52:55
Message-ID: 20070924035255.9F991753E4C@cvs.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Log Message:
-----------
Reduce the size of memory allocations by lazy vacuum when processing a small
table, by allocating just enough for a hardcoded number of dead tuples per
page. The current estimate is 200 dead tuples per page.

Per reports from Jeff Amiel, Erik Jones and Marko Kreen, and subsequent
discussion.
CVS: ----------------------------------------------------------------------
CVS: Enter Log. Lines beginning with `CVS:' are removed automatically
CVS:
CVS: Committing in .
CVS:
CVS: Modified Files:
CVS: commands/vacuumlazy.c
CVS: ----------------------------------------------------------------------

Modified Files:
--------------
pgsql/src/backend/commands:
vacuumlazy.c (r1.99 -> r1.100)
(http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/commands/vacuumlazy.c?r1=1.99&r2=1.100)


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Alvaro Herrera" <alvherre(at)postgresql(dot)org>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce the size of memory allocations by lazy vacuum when
Date: 2007-09-24 09:02:10
Message-ID: 46F77D12.8050408@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Alvaro Herrera wrote:
> Log Message:
> -----------
> Reduce the size of memory allocations by lazy vacuum when processing a small
> table, by allocating just enough for a hardcoded number of dead tuples per
> page. The current estimate is 200 dead tuples per page.

200 sounds like a badly chosen value. With a 8k block size, that's a bit
less than MaxHeapTuplesPerPage, which means that in the worst case you
don't allocate enough space to hold all dead tuples, and you end up
doing 2 index cleanups, no matter how large you set
maintenance_work_mem. Remember that having MaxHeapTuplesPerPage dead
tuples on a page just got much more likely with HOT, and with larger
block sizes 200 tuples isn't very much anyway.

At the other end of the spectrum, with a smaller block size 200 is more
than MaxHeapTuplesPerPage, so we're still allocating more than necessary.

Note that as the patch stands, the capping is not limited to small
tables. Doing extra index passes on a relatively big table with lots of
indexes might be cause a lot of real extra I/O.

How about just using MaxHeapTuplesPerPage? With the default 8K block
size, it's not that much more than 200, but makes the above gripes
completely go away. That seems like the safest option at this point.

> Per reports from Jeff Amiel, Erik Jones and Marko Kreen, and subsequent
> discussion.

Ok, I just read that discussion in the archives. A lot of good ideas
were suggested, like reducing the space required for the tid list, or
dividing maintenance_work_mem between workers. None of that is going to
happen for 8.3, so it seems likely that we're going to revisit this in
8.4. Let's keep it simple and safe for now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce the size of memory allocations by lazy vacuum when
Date: 2007-09-24 09:28:22
Message-ID: 1190626102.4181.54.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
> Alvaro Herrera wrote:
> > Log Message:
> > -----------
> > Reduce the size of memory allocations by lazy vacuum when processing a small
> > table, by allocating just enough for a hardcoded number of dead tuples per
> > page. The current estimate is 200 dead tuples per page.
>
> 200 sounds like a badly chosen value. With a 8k block size, that's a bit
> less than MaxHeapTuplesPerPage, which means that in the worst case you
> don't allocate enough space to hold all dead tuples, and you end up
> doing 2 index cleanups, no matter how large you set
> maintenance_work_mem.

Agreed. Tables with 2-4 columns easily fall into that category.
Assoication tables are often like this and can have 2 indexes on them.

> Note that as the patch stands, the capping is not limited to small
> tables. Doing extra index passes on a relatively big table with lots of
> indexes might be cause a lot of real extra I/O.
>
> How about just using MaxHeapTuplesPerPage? With the default 8K block
> size, it's not that much more than 200, but makes the above gripes
> completely go away. That seems like the safest option at this point.

It would be much better to use a value for each table. Any constant
value will be sub-optimal in many cases.

Let's use our knowledge of the table to calculate a sensible value. We
often have average row length available from last VACUUM, don't we? Use
that, plus 10%.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)postgresql(dot)org>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when
Date: 2007-09-24 09:39:59
Message-ID: 46F785EF.1070907@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Simon Riggs wrote:
> On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
>> How about just using MaxHeapTuplesPerPage? With the default 8K block
>> size, it's not that much more than 200, but makes the above gripes
>> completely go away. That seems like the safest option at this point.
>
> It would be much better to use a value for each table. Any constant
> value will be sub-optimal in many cases.

Allocating extra memory doesn't usually do much harm, as long as you
don't actually use it. The reason we're now limiting it is to avoid Out
Of Memory errors if you're running with overcommit turned off, and
autovacuum triggers a vacuum on multiple tables at the same time.

Let's keep it simple. Per-table setting would be much more complex and
would be something that the DBA would need to calculate and set. If you
really do run into this problem, you can just dial down
maintenance_work_mem.

> Let's use our knowledge of the table to calculate a sensible value. We
> often have average row length available from last VACUUM, don't we? Use
> that, plus 10%.

If there's dead line pointers in the table, left behind by HOT pruning,
the average row length would be completely bogus. Using the dead_tuples
stats directly would be more reasonable, but I would like us to keep
this even more conservative than that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when
Date: 2007-09-24 10:23:01
Message-ID: 1190629381.4181.96.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, 2007-09-24 at 10:39 +0100, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
> >> How about just using MaxHeapTuplesPerPage? With the default 8K block
> >> size, it's not that much more than 200, but makes the above gripes
> >> completely go away. That seems like the safest option at this point.
> >
> > It would be much better to use a value for each table. Any constant
> > value will be sub-optimal in many cases.
>
> Allocating extra memory doesn't usually do much harm, as long as you
> don't actually use it. The reason we're now limiting it is to avoid Out
> Of Memory errors if you're running with overcommit turned off, and
> autovacuum triggers a vacuum on multiple tables at the same time.
>
> Let's keep it simple. Per-table setting would be much more complex and
> would be something that the DBA would need to calculate and set. If you
> really do run into this problem, you can just dial down
> maintenance_work_mem.

Much more complex? RelationGetAvgFSM(relation)??

> > Let's use our knowledge of the table to calculate a sensible value. We
> > often have average row length available from last VACUUM, don't we? Use
> > that, plus 10%.
>
> If there's dead line pointers in the table, left behind by HOT pruning,
> the average row length would be completely bogus. Using the dead_tuples
> stats directly would be more reasonable, but I would like us to keep
> this even more conservative than that.

That's a better argument.

Since we have committed HOT, I would suggest we move the default of
autovacuum_max_workers down to 2. That will limit the default amount of
memory used by VACUUMs much better than trying to get the values
precisely and we expect them to run much less frequently anyway now.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Alvaro Herrera" <alvherre(at)postgresql(dot)org>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce the size ofmemoryallocations by lazy vacuum when
Date: 2007-09-24 10:34:23
Message-ID: 46F792AF.8010705@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Simon Riggs wrote:
> On Mon, 2007-09-24 at 10:39 +0100, Heikki Linnakangas wrote:
>> Let's keep it simple. Per-table setting would be much more complex and
>> would be something that the DBA would need to calculate and set. If you
>> really do run into this problem, you can just dial down
>> maintenance_work_mem.
>
> Much more complex? RelationGetAvgFSM(relation)??

The code isn't complex, but the resulting behavior is.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Alvaro Herrera" <alvherre(at)postgresql(dot)org>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when
Date: 2007-09-24 11:48:01
Message-ID: 87myvcffny.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:

> Simon Riggs wrote:
>> On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
>>> How about just using MaxHeapTuplesPerPage? With the default 8K block
>>> size, it's not that much more than 200, but makes the above gripes
>>> completely go away. That seems like the safest option at this point.
>>
>> It would be much better to use a value for each table. Any constant
>> value will be sub-optimal in many cases.
>
> Allocating extra memory doesn't usually do much harm, as long as you
> don't actually use it. The reason we're now limiting it is to avoid Out
> Of Memory errors if you're running with overcommit turned off, and
> autovacuum triggers a vacuum on multiple tables at the same time.

For reference, MaxHeapTuplesPerPage on an 8k block is 291. If there are any
columns in your tuples (meaning they're not either HOT updates which have been
pruned or rows with 8 or fewer columns all of which are null) then the most
you can have is 255 rows.

For the small difference between 200 and 291 it seems safer to just use
MaxHeapTuplesPerPage.

BS MHTPG Max w/data
--------------------------
4096 145 127
8192 291 255
16384 584 511
32768 1169 1023

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when
Date: 2007-09-24 15:26:54
Message-ID: 20070924152653.GB6353@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Gregory Stark wrote:
> "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
>
> > Simon Riggs wrote:
> >> On Mon, 2007-09-24 at 10:02 +0100, Heikki Linnakangas wrote:
> >>> How about just using MaxHeapTuplesPerPage? With the default 8K block
> >>> size, it's not that much more than 200, but makes the above gripes
> >>> completely go away. That seems like the safest option at this point.
> >>
> >> It would be much better to use a value for each table. Any constant
> >> value will be sub-optimal in many cases.
> >
> > Allocating extra memory doesn't usually do much harm, as long as you
> > don't actually use it. The reason we're now limiting it is to avoid Out
> > Of Memory errors if you're running with overcommit turned off, and
> > autovacuum triggers a vacuum on multiple tables at the same time.
>
> For reference, MaxHeapTuplesPerPage on an 8k block is 291. If there are any
> columns in your tuples (meaning they're not either HOT updates which have been
> pruned or rows with 8 or fewer columns all of which are null) then the most
> you can have is 255 rows.

How about we change it to MaxHeapTuplesPerPage for now. That closes all
complaints in this thread.

Later we can think about better ways of doing the whole thing, like
using non-lossy tidbitmaps.

--
Alvaro Herrera Developer, http://www.PostgreSQL.org/
"Para tener más hay que desear menos"


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Reduce the size of memoryallocations by lazy vacuum when
Date: 2007-09-26 20:54:23
Message-ID: 20070926205423.GJ8572@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Alvaro Herrera wrote:

> How about we change it to MaxHeapTuplesPerPage for now. That closes all
> complaints in this thread.

Done, thanks for the input!

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
"La rebeldía es la virtud original del hombre" (Arthur Schopenhauer)